CVcluster function

Cross-validation estimate of predictive accuracy for clustered data

Cross-validation estimate of predictive accuracy for clustered data

This function adapts cross-validation to work with clustered categorical outcome data. For example, there may be multiple observations on individuals (clusters). It requires a fitting function that accepts a model formula.

CVcluster(formula, id, data, na.action=na.omit, nfold = 15, FUN = MASS::lda, predictFUN=function(x, newdata, ...)predict(x, newdata, ...)$class, printit = TRUE, cvparts = NULL, seed = 29)

Arguments

  • formula: Model formula
  • id: numeric, identifies clusters
  • data: data frame that supplies the data
  • na.action: na.fail (default) or na.omit
  • nfold: Number of cross-validation folds
  • FUN: function that fits the model
  • predictFUN: function that gives predicted values
  • printit: Should summary information be printed?
  • cvparts: Use, if required, to specify the precise folds used for the cross-validation. The comparison between different models will be more accurate if the same folds are used.
  • seed: Set seed, if required, so that results are exactly reproducible

Returns

  • class: Predicted values from cross-validation

  • CVaccuracy: Cross-validation estimate of accuracy

  • confusion: Confusion matrix

References

https://maths-people.anu.edu.au/~johnm/nzsr/taws.html

Author(s)

John Maindonald

Examples

if(requireNamespace('mlbench')&requireNamespace('MASS')){ data('Vowel',package='mlbench') acc <- CVcluster(formula=Class ~., id = V1, data = Vowel, nfold = 15, FUN = MASS::lda, predictFUN=function(x, newdata, ...)predict(x, newdata, ...)$class, printit = TRUE, cvparts = NULL, seed = 29) }
  • Maintainer: John Maindonald
  • License: GPL (>= 2)
  • Last published: 2023-08-21

Useful links