repeatcv function

Repeated nested CV

Repeated nested CV

Performs repeated calls to a nestedcv model to determine performance across repeated runs of nested CV.

repeatcv( expr, n = 5, repeat_folds = NULL, keep = FALSE, extra = FALSE, progress = TRUE, rep_parallel = "mclapply", rep.cores = 1L )

Arguments

  • expr: An expression containing a call to nestcv.glmnet(), nestcv.train(), nestcv.SuperLearner() or outercv().
  • n: Number of repeats
  • repeat_folds: Optional list containing fold indices to be applied to the outer CV folds.
  • keep: Logical whether to save repeated outer CV fitted models for variable importance, SHAP etc. Note this can make the resulting object very large.
  • extra: Logical whether additional performance metrics are gathered for binary classification models. See metrics().
  • progress: Logical whether to show progress.
  • rep_parallel: Either "mclapply" or "future". This determines which parallel backend to use.
  • rep.cores: Integer specifying number of cores/threads to invoke. Ignored if rep_parallel = "future".

Returns

List of S3 class 'repeatcv' containing: - call: the model call

  • result: matrix of performance metrics

  • output: a matrix or dataframe containing the outer CV predictions from each repeat

  • roc: (binary classification models only) a ROC curve object based on predictions across all repeats as returned in output, generated by pROC::roc()

  • fits: (if keep = TRUE) list of length n containing slimmed 'nestedcv' model objects for calculating variable importance or SHAP values

Details

We recommend using this with the R pipe |> (see examples).

When comparing models, it is recommended to fix the sets of outer CV folds used across each repeat for comparing performance between models. The function repeatfolds() can be used to create a fixed set of outer CV folds for each repeat.

Parallelisation over repeats is performed using parallel::mclapply (not available on windows) or future depending on how rep_parallel is set. Beware that cv.cores can still be set within calls to nestedcv models (= nested parallelisation). This means that rep.cores x cv.cores number of processes/forks will be spawned, so be careful not to overload your CPU. In general parallelisation of repeats using rep.cores is faster than parallelisation using cv.cores. rep.cores is ignored if you are using future. Set the number of workers for future using future::plan().

Examples

data("iris") dat <- iris y <- dat$Species x <- dat[, 1:4] res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1, n_outer_folds = 4) |> repeatcv(3, rep.cores = 2) res summary(res) ## set up fixed fold indices set.seed(123, "L'Ecuyer-CMRG") folds <- repeatfolds(y, repeats = 3, n_outer_folds = 4) res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1, n_outer_folds = 4) |> repeatcv(3, repeat_folds = folds, rep.cores = 2) res
  • Maintainer: Myles Lewis
  • License: MIT + file LICENSE
  • Last published: 2025-03-10