Friday, October 26, 2018

Julio Delgado performing Maria, at Soulfood Cafe in Redmond, Washington



http://soulfoodcoffeehouse.com/ 

Monday, October 22, 2018

Anomaly detection: Robust Random Cut Forest

Anomaly detection.md

Anomaly detection

Anomaly detection is one of the cornerstone problems in data mining, and has many practical applications, like detection of events of interest in real time cloud services operational monitoring.

Assuming we have a model to represent the data, we can analyze anomaly detection from the perspective of model complexity and say that a point is an anomaly if the complexity of the model increases substantially with the inclusion of the point.

Robust Random Cut Forest

Robst Random Cut Forest (RRCF) is a method for anomaly detection in dynamic data streams. It was first published in a paper by Guha et al in 2016, and it currently offered by Amazon AWS Kinesys, and Amazon AWS Sagemaker.

Definition

A Robust Random Cut Tree (RRCT) on a dataset S is generated as follows

i = choose_random_dim_proportional_to(dim_range) for all points in S
Pivot = choose_uniform_random_in_range(Min(Val_i), Max(Val_i)) for all points in S
S1 = {datapoints whose Val_i <= Pivot}
S2 = {dataoints whose Val_i > Pivot}
Recurse on S1 and S3

A RRCF is a collection of independents RRCTs.

Anomaly detection

TODO

Disclosure: This post does not represent the views of my employer. I am the sole author.

Sunday, August 19, 2018

MxNet Gluon Linear regression

MxNet is a very powerful Deep Neural Network (DNN) framework. Gluon is part of MxNet, and offers a higher level API. In their own words:

The Gluon package is a high-level interface for MXNet designed to be easy to use, while keeping most of the flexibility of a low level API. Gluon supports both imperative and symbolic programming, making it easy to train complex models imperatively in Python and then deploy with a symbolic graph in C++ and Scala.

It is very easy to get started with MxNet and Gluon using AWS Sagemaker hosted notebooks. You may be eligible for Sagemaker's free tier (https://aws.amazon.com/sagemaker/pricing/).

I have created a notebook showing:

* How to use Gluon to define a simple DNN to perfrom linear regression
* Train the DNN
* Save the DNN
* Restore the DNN

My notebook is available in github (https://github.com/julitopower/LearnMachineLearning/blob/master/Gluon_linear_regression.ipynb), and can be imported into Sagemaker notebooks (select conda_python3 environment).

Disclaimer: This blog reflects only my personal experiences and opinions, which are not necessarily those of my employer.

Sunday, June 04, 2017

Julio Delgado. Arena

Tengo frío. Open mic at http://soulfoodcoffeehouse.com/

Monday, March 20, 2017

AWS Solutions Architect Exam: Storage Gateway

This post summarizes the most important characteristics of AWS Storage Gateway relevant to the AWS Solutions Architect Associate level Exam.

What is AWS Storage Gateway: It is a service to connect an on-premisses software appliance with cloud-based storage,  to provide seamless and secure integration between an organization's on-premises IT and AWS storage infrastructure. This service is suitable for hybrid deployments, and enables the storage of data securely on the AWS cloud.

Software appliance: In order to use AWS Storage Gateway you have to setup a software appliance in your datacenter. AWS Storage Gateway's storage appliance is available as a Virtual Machine. You download, install, and register it with AWS.  The software appliance is exposed as an iSCSI device that can be mounted by your on-promises applications

Configurations:

File Gateway: This is basically a file interface into S3. The gateway provides access to objects in S3 as files on a NFS mount point.  It also provides low-latency access to data through transparent local caching.

Gateway-Cached Volumes: All data is stored in S3, and recently accessed data is cached locally. I volume han hold up to 32TB, but you can have up to 32 Volumes. It offers the ability to perform incremental point-in-time snapshots

Gateway-Stored Volumes: Data is backed-up asynchronously in S3 in the form of EBS volumes. All data is  kept on-premises as well. 16TB max per volumen, and a max of 32 volumes.

Gateway Virtual Tape Libraries: This is an archival solution that allow the storage of data in virtual tapes on the AWS cloud. If your applications use tape backups, they can seamlessly use the Gateway Virtual Tape Libraries. The final storage for Virtual Tapes is Glacier. Ejected tapes are stored in a Virtual Tape Shelf. Only one per account per region is allowed, but can be shared by multiple Gateways. 

Encryption:  All data is transferred using SSL, and is stored encrypted using server side encryption.

Saturday, March 18, 2017

Austin SXSW 2017

Last week I went to Austin (Texas) to represent AWS Quicksight in SXSW. It was an incredible experience. Austin is such a vibrant city, and the surroundings are gorgeous.

After a hard day's work, there is nothing better that chilling out playing pool :).



Enjoy!

Julio

AWS Solutions Architect Exam: S3

This post summarizes the most important characteristics of S3 relevant to the AWS Solutions Architect Associate level Exam.

What is S3: S3 is is an object key/value store part of Amazon AWS

How is is organized: In S3 objects are placed in buckets, Within a bucket each object has a unique key. the Buckets namespace is global, whereas the keys only have to be unique at the bucket level.

S3 is not file storage, and it does not offer a filesystem like interface, although with the use prefixes is its possible to get S3 to display buckets and objects in a hierarchy similar to a filesystem.

Storage classes: Amazon S3 provides different storage classes which offer different characteristics that meet the needs of most users. More on storage classes can be found in https://aws.amazon.com/s3/storage-classes/

S3 Standard: 11 9s of durability, 4 9s of availability. Supports encryption of data in transit and at rest.

S3 Infrequent Access: Same durability, 3 9s of availability. All the other characteristics are the same as S3 Standard.  It has a minimal object size of 128Kb, and minimum storage duration of 30 days.

S3 Archive: This involves the use of AWS Glaciers as an extension of S3, using object lifecycle management, as explained below.

S3 Reduced Redundancy: S3 allows the storage of objects with reduced redundancy. This option is cheaper, and recommended for non critical assets. It offers 4 9s of durability, and 4 9s of availability.

S3 as a static web server: An S3 bucket can be configured as a web server for static content. To enable this functionality an index and error pages have to be defined, and the corresponding objects must be made public.

S3 object lifecycle management: A set of rules determine actions to take on a group of objects. Actions can be transition actions, or expiration actions. Transition actions change the storage class of objects, while expiration actions cause the deletion of objects. Lifecycle is part of a bucket's configuration

S3 object versioning: A bucket can be configured to support object versioning. This allows keeping several versions of the same object. Using lifecycle management it is possible configure different behavior for the current and noncurrent versions.

S3 access control: By default S3 resources are private (only the owner can see and manage them). There are a number of mechanism that can be used to manage access to S3 resources.

S3 policies can be assigned to buckets, objects and users to limit actions are allowed. For example, with a bucket policy it is possible to grant access to another AWS account.

S3 ACLs: This is a legacy mechanism, that can be applied to buckets and objects.

IAM: IAM users, policies and roles can be used to control access to S3 resources.

S3 encryption: Both user-side and server-side encryption options are available. SSL can be used for data in transit as well.

Server-side encryption: Protects data at rest. There are 3 options: S3 managed keys, AWS KMS, and customer provided keys

Client-side encryption: Protects data in transit and at rest. Customers can choose to provide and mange their own keys, or use AWS KMS.

S3 notifications: Notifications for bucket events (object creation, removal, loss of reduced redundancy object). Notifications can be send to SQS, SNS or Lambda

S3 consistency model: Puts of new objects are consistent. Put/delete of existing object is eventually consistent. The READ api takes an extra parameter which can force the call to be consistent. This will make the call potentially slower

For more information refer to AWS S3 online documentation (which is excellent and very detailed):

http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html

Sunday, May 29, 2016

Living the American Dream

It has been a long time since my last kind of personal post. Lot's of things have been going on, and super exciting changes have actually taken place.

For starters, I don't live in Europe anymore. The stars lined up with the planets while I was having a pint of Guiness and I ended up moving to the USA, more specifically to the Pacific Northwest. The Seattle area is what I call home these days! It was sad to say goodbye to Ireland again. Ireland is the place I truly call home in my heart. Dublin is an amazingly vibrant city, and it will be sorely missed. I live good FRIENDS back there, but I know I'll be visiting often.



In terms of job, I have changed teams within Amazon. I no longer work for AWS CloudWatch, my new team is AWS Quicksight. We are building the future of Business Intelligence tools, and ground breaking new technologies for realtime processing of "Universe Scale" datasets ;).

Enjoy the views:




Tree command with boost::filesystem

The tree command is pretty handy to get an overview of the contents of small folders. By small I mean both in the number of files, and the number of levels in the directory structure.  A sample output of the tree command is as follows:

user@host:~/projects$ tree templates/
--CMakeLists.txt~
--app.s
--src
    |--CMakeLists.txt~
    |--#Templates.cpp#
    |--Templates.cpp
    |--Templates.cpp~
    |--CMakeLists.txt
--build
    |--Makefile
    |--CTestTestfile.cmake
    |--CMakeCache.txt
    |--src


Using boost::filesystem this can be achieved with a handful of lines, and your code is multi-platform without any extra effort:

#include <boost/filesystem.hpp>
#include <boost/filesystem/exception.hpp>
#include <boost/algorithm/replace.hpp>

#include <iostream>
#include <string>

namespace fs = boost::filesystem;

void print(const fs::path& p,
           std::ostream& os = std::cout,
           const std::string& preffix = "--") {
  try {
    fs::directory_iterator it{p};
    while (it != fs::directory_iterator{}) {
      auto filename = it->path().filename().string();
      boost::replace_all(filename, "\"", "");
      std::cout << preffix << filename << std::endl;

      if(fs::is_directory(it->status())) {
        auto pfx = std::string("    |") + preffix;
        print(it->path(), os, pfx);
      }
      ++it;
    }
  } catch(const fs::filesystem_error& err) {
    std::cout << preffix << err.what() << std::endl;
  }
}

int main(int argc, char** argv) {
  if(argc > 1)
    print(argv[1]);
  else
    print(fs::current_path());
}

Happy Coding.