create_folds function

Create Folds

Create Folds

This function provides a list of row indices used for k-fold cross-validation (basic, stratified, grouped, or blocked). Repeated fold creation is supported as well. By default, in-sample indices are returned.

create_folds( y, k = 5L, type = c("stratified", "basic", "grouped", "blocked"), n_bins = 10L, m_rep = 1L, use_names = TRUE, invert = FALSE, shuffle = FALSE, seed = NULL )

Arguments

  • y: Either the variable used for "stratification" or "grouped" splits. For other types of splits, any vector of the same length as the data intended to split.

  • k: Number of folds.

  • type: Split type. One of "stratified" (default), "basic", "grouped", "blocked".

  • n_bins: Approximate numbers of bins for numeric y

    (only for type = "stratified").

  • m_rep: How many times should the data be split into k folds? Default is 1, i.e., no repetitions.

  • use_names: Should folds be named? Default is TRUE.

  • invert: Set to TRUE in order to receive out-of-sample indices. Default is FALSE, i.e., in-sample indices are returned.

  • shuffle: Should row indices be randomly shuffled within folds? Default is FALSE.

  • seed: Integer random seed.

Returns

If invert = FALSE (the default), a list with in-sample row indices. If invert = TRUE, a list with out-of-sample indices.

Details

By default, the function uses stratified splitting. This will balance the folds regarding the distribution of the input vector y. (Numeric input is first binned into n_bins quantile groups.) If type = "grouped", groups specified by y are kept together when splitting. This is relevant for clustered or panel data. In contrast to basic splitting, type = "blocked" does not sample indices at random, but rather keeps them in sequential groups.

Examples

y <- rep(c(letters[1:4]), each = 5) create_folds(y) create_folds(y, k = 2) create_folds(y, k = 2, m_rep = 2) create_folds(y, k = 3, type = "blocked")

See Also

partition(), create_timefolds()

  • Maintainer: Michael Mayer
  • License: GPL (>= 2)
  • Last published: 2023-06-06