Anomaly detection: Robust Random Cut Forest
Anomaly detection
Anomaly detection is one of the cornerstone problems in data mining, and has many practical applications, like detection of events of interest in real time cloud services operational monitoring.
Assuming we have a model to represent the data, we can analyze anomaly detection from the perspective of model complexity and say that a point is an anomaly if the complexity of the model increases substantially with the inclusion of the point.
Robust Random Cut Forest
Robst Random Cut Forest (RRCF) is a method for anomaly detection in dynamic data streams. It was first published in a paper by Guha et al in 2016, and it currently offered by Amazon AWS Kinesys, and Amazon AWS Sagemaker.
Definition
A Robust Random Cut Tree (RRCT) on a dataset S is generated as follows
i = choose_random_dim_proportional_to(dim_range) for all points in S
Pivot = choose_uniform_random_in_range(Min(Val_i), Max(Val_i)) for all points in S
S1 = {datapoints whose Val_i <= Pivot}
S2 = {dataoints whose Val_i > Pivot}
Recurse on S1 and S3
A RRCF is a collection of independents RRCTs.
Anomaly detection
TODO
Disclosure: This post does not represent the views of my employer. I am the sole author.
<< Home