Performs repeated calls to a nestedcv model to determine performance across repeated runs of nested CV.
repeatcv( expr, n =5, repeat_folds =NULL, keep =FALSE, extra =FALSE, progress =TRUE, rep_parallel ="mclapply", rep.cores =1L)
Arguments
expr: An expression containing a call to nestcv.glmnet(), nestcv.train(), nestcv.SuperLearner() or outercv().
n: Number of repeats
repeat_folds: Optional list containing fold indices to be applied to the outer CV folds.
keep: Logical whether to save repeated outer CV fitted models for variable importance, SHAP etc. Note this can make the resulting object very large.
extra: Logical whether additional performance metrics are gathered for binary classification models. See metrics().
progress: Logical whether to show progress.
rep_parallel: Either "mclapply" or "future". This determines which parallel backend to use.
rep.cores: Integer specifying number of cores/threads to invoke. Ignored if rep_parallel = "future".
Returns
List of S3 class 'repeatcv' containing: - call: the model call
result: matrix of performance metrics
output: a matrix or dataframe containing the outer CV predictions from each repeat
roc: (binary classification models only) a ROC curve object based on predictions across all repeats as returned in output, generated by pROC::roc()
fits: (if keep = TRUE) list of length n containing slimmed 'nestedcv' model objects for calculating variable importance or SHAP values
Details
We recommend using this with the R pipe |> (see examples).
When comparing models, it is recommended to fix the sets of outer CV folds used across each repeat for comparing performance between models. The function repeatfolds() can be used to create a fixed set of outer CV folds for each repeat.
Parallelisation over repeats is performed using parallel::mclapply (not available on windows) or future depending on how rep_parallel is set. Beware that cv.cores can still be set within calls to nestedcv models (= nested parallelisation). This means that rep.cores x cv.cores number of processes/forks will be spawned, so be careful not to overload your CPU. In general parallelisation of repeats using rep.cores is faster than parallelisation using cv.cores. rep.cores is ignored if you are using future. Set the number of workers for future using future::plan().
Examples
data("iris")dat <- iris
y <- dat$Species
x <- dat[,1:4]res <- nestcv.glmnet(y, x, family ="multinomial", alphaSet =1, n_outer_folds =4)|> repeatcv(3, rep.cores =2)res
summary(res)## set up fixed fold indicesset.seed(123,"L'Ecuyer-CMRG")folds <- repeatfolds(y, repeats =3, n_outer_folds =4)res <- nestcv.glmnet(y, x, family ="multinomial", alphaSet =1, n_outer_folds =4)|> repeatcv(3, repeat_folds = folds, rep.cores =2)res