model: The model of type rf or rfNear as returned by CoreModel.
dataset: Training instances that produced random forest model.
clustering: A clustering vector of dataset training instances used in model.
Details
The attributes are evaluated via provided random forest's out-of-bag sets. Values for each attribute in turn are randomly shuffled and classified with random forest. The difference between average margin of non-shuffled and shuffled instances serves as a quality estimate of the attribute. The function rfAttrEvalClustering uses a clustering of the training instances to produce importance score of attributes for each cluster separately. If parameter clustering is set to NULL
the actual class values of the instances are used as clusters thereby producing the evaluation of attributes specific for each of the class values.
Returns
In case of rfAttrEval a vector of evaluations for the features in the order specified by the formula used to generate the provided model. In case of rfAttrEvalClustering a matrix is returned, where each row contains evaluations for one of the clusters.
Author(s)
Marko Robnik-Sikonja (thesis supervisor) and John Adeyanju Alao (as a part of his BSc thesis)
See Also
CORElearn, CoreModel, attrEval.
Examples
# build random forests model with certain parametersmodelRF <- CoreModel(Species ~ ., iris, model="rf", selectionEstimator="MDL", minNodeWeightRF=5, rfNoTrees=100, maxThreads=1)rfAttrEval(modelRF)# feature evaluationsx <- rfAttrEval(modelRF)# feature evaluations for each classprint(x)destroyModels(modelRF)# clean up
References
Marko Robnik-Sikonja: Improving Random Forests. In J.-F. Boulicaut et al.(Eds): ECML 2004, LNAI 3210, Springer, Berlin, 2004, pp. 359-370 Available also from http://lkm.fri.uni-lj.si/rmarko/papers/
Leo Breiman: Random Forests. Machine Learning Journal, 2001, 45, 5-32