rfClustering function

Random forest based clustering

Random forest based clustering

Creates a clustering of random forest training instances. Random forest provides proximity of its training instances based on their out-of-bag classification. This information is usually passed to visualizations (e.g., scaling) and attribute importance measures.

rfClustering(model, noClusters=4)

Arguments

  • model: a random forest model returned by CoreModel
  • noClusters: number of clusters

Details

The method calls pam function for clustering, initializing its distance matrix with random forest based similarity by calling rfProximity with argument model.

Returns

An object of class pam representing the clustering (see ?pam.object for details), the most important being a vector of cluster assignments (named cluster) to training instances used to generate the model.

Examples

set<-iris md<-CoreModel(Species ~ ., set, model="rf", rfNoTrees=30, maxThreads=1) mdCluster<-rfClustering(md, 5) destroyModels(md) # clean up

Author(s)

John Adeyanju Alao (as a part of his BSc thesis) and Marko Robnik-Sikonja (thesis supervisor)

See Also

CoreModel

rfProximity

pam

References

Leo Breiman: Random Forests. Machine Learning Journal, 45:5-32, 2001