rrcforest
Syntax
Description
Use the rrcforest
function to fit a robust random cut forest model for outlier detection and novelty
detection.
Outlier detection (detecting anomalies in training data) — Use the output argument
tf
ofrrcforest
to identify anomalies in training data.Novelty detection (detecting anomalies in new data with uncontaminated training data) — Create a
RobustRandomCutForest
model object by passing uncontaminated training data (data with no outliers) torrcforest
. Detect anomalies in new data by passing the object and the new data to the object functionisanomaly
.
returns a forest
= rrcforest(Tbl
)RobustRandomCutForest
model object for the predictor data in the table
Tbl
.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example, specify
forest
= rrcforest(___,Name=Value
)
to process 10% of the
training data as anomalies.ContaminationFraction
=0.1
Examples
Input Arguments
Name-Value Arguments
Output Arguments
More About
Algorithms
rrcforest
considers NaN
, ''
(empty character vector), ""
(empty string), <missing>
, and <undefined>
values in Tbl
and NaN
values in X
to be missing values.
rrcforest
uses observations with missing values to find splits on
variables for which these observations have valid values. The function might place these
observations in a branch node, not a leaf node. Then rrcforest
computes the ratio (Disp
(x,C)/|C|) by traversing from the branch node to the root node for each tree. The
function places an observation with all missing values in the root node. Therefore, the
ratio and the anomaly score become the number of training observations for each tree, which
is the maximum possible anomaly score for the trained robust random cut forest model. You
can specify the number of training observations for each tree by using the NumObservationsPerLearner
name-value argument.
References
[1] Guha, Sudipto, N. Mishra, G. Roy, and O. Schrijvers. "Robust Random Cut Forest Based Anomaly Detection on Streams," Proceedings of The 33rd International Conference on Machine Learning 48 (June 2016): 2712–21.
[2] Bartos, Matthew D., A. Mullapudi, and S. C. Troutman. "rrcf: Implementation of the Robust Random Cut Forest Algorithm for Anomaly Detection on Streams." Journal of Open Source Software 4, no. 35 (2019): 1336.
Extended Capabilities
Version History
Introduced in R2023a