RFcluster function

Random forests estimate of predictive accuracy for clustered data

Random forests estimate of predictive accuracy for clustered data

This function adapts random forests to work (albeit clumsily and inefficiently) with clustered categorical outcome data. For example, there may be multiple observations on individuals (clusters). Predictions are made fof the OOB (out of bag) clusters

RFcluster(formula, id, data, nfold = 15, ntree=500, progress=TRUE, printit = TRUE, seed = 29)

Arguments

  • formula: Model formula
  • id: numeric, identifies clusters
  • data: data frame that supplies the data
  • nfold: numeric, number of folds
  • ntree: numeric, number of trees (number of bootstrap samples)
  • progress: Print information on progress of calculations
  • printit: Print summary information on accuracy
  • seed: Set seed, if required, so that results are exactly reproducible

Details

Bootstrap samples are taken of observations in the in-bag clusters. Predictions are made for all observations in the OOB clusters.

Returns

  • class: Predicted values from cross-validation

  • OOBaccuracy: Cross-validation estimate of accuracy

  • confusion: Confusion matrix

References

https://maths-people.anu.edu.au/~johnm/nzsr/taws.html

Author(s)

John Maindonald

Examples

## Not run: library(mlbench) library(randomForest) data(Vowel) RFcluster(formula=Class ~., id = V1, data = Vowel, nfold = 15, ntree=500, progress=TRUE, printit = TRUE, seed = 29) ## End(Not run)
  • Maintainer: John Maindonald
  • License: GPL (>= 2)
  • Last published: 2023-08-21

Useful links