nested_cv can be used to take the results of one resampling procedure and conduct further resamples within each split. Any type of resampling used in rsample can be used.
nested_cv(data, outside, inside)
Arguments
data: A data frame.
outside: The initial resampling specification. This can be an already created object or an expression of a new object (see the examples below). If the latter is used, the data argument does not need to be specified and, if it is given, will be ignored.
inside: An expression for the type of resampling to be conducted within the initial procedure.
Returns
An tibble with nested_cv class and any other classes that outer resampling process normally contains. The results include a column for the outer data split objects, one or more id columns, and a column of nested tibbles called inner_resamples with the additional resamples.
Details
It is a bad idea to use bootstrapping as the outer resampling procedure (see the example below)
Examples
## Using expressions for the resampling procedures:nested_cv(mtcars, outside = vfold_cv(v =3), inside = bootstraps(times =5))## Using an existing object:folds <- vfold_cv(mtcars)nested_cv(mtcars, folds, inside = bootstraps(times =5))## The dangers of outer bootstraps:set.seed(2222)bad_idea <- nested_cv(mtcars, outside = bootstraps(times =5), inside = vfold_cv(v =3))first_outer_split <- bad_idea$splits[[1]]outer_analysis <- as.data.frame(first_outer_split)sum(grepl("Volvo 142E", rownames(outer_analysis)))## For the 3-fold CV used inside of each bootstrap, how are the replicated## `Volvo 142E` data partitioned?first_inner_split <- bad_idea$inner_resamples[[1]]$splits[[1]]inner_analysis <- as.data.frame(first_inner_split)inner_assess <- as.data.frame(first_inner_split, data ="assessment")sum(grepl("Volvo 142E", rownames(inner_analysis)))sum(grepl("Volvo 142E", rownames(inner_assess)))