rfOutliers function

Random forest based outlier detection

Random forest based outlier detection

Based on random forest instance proximity measure detects training cases which are different to all other cases.

rfOutliers(model, dataset)

Arguments

  • model: a random forest model returned by CoreModel
  • dataset: a training set used to generate the model

Returns

For each instance from a dataset the function returns a numeric score of its strangeness to other cases.

Details

Strangeness is defined using the random forest model via a proximity matrix (see rfProximity). If the number is greater than 10, the case can be considered an outlier according to Breiman 2001.

Examples

#first create a random forest tree using CORElearn dataset <- iris md <- CoreModel(Species ~ ., dataset, model="rf", rfNoTrees=30, maxThreads=1) outliers <- rfOutliers(md, dataset) plot(abs(outliers)) #for a nicer display try plot(md, dataset, rfGraphType="outliers") destroyModels(md) # clean up

Author(s)

John Adeyanju Alao (as a part of his BSc thesis) and Marko Robnik-Sikonja (thesis supervisor)

See Also

CoreModel, rfProximity, rfClustering.

References

Leo Breiman: Random Forests. Machine Learning Journal, 45:5-32, 2001