multi_strata function

Create Strata from Multiple Features

Create Strata from Multiple Features

Creates a stratification vector based on multiple columns of a data.frame that can then be passed to the splitting functions.

Currently, the function offers two strategies to create the strata:

  • "kmeans": k-means cluster analysis on scaled input. (Ordered factors are integer encoded first, unordered factors and character columns are one-hot-encoded.)
  • "interaction": All combinations (after binning numeric columns into approximately k bins).
multi_strata(df, strategy = c("kmeans", "interaction"), k = 3L)

Arguments

  • df: A data.frame used to form the stratification vector.
  • strategy: A string (either "kmeans" or "interaction") to compute the strata, see description.
  • k: An integer. For strategy = "kmeans", it is the desired number of strata, while for strategy = "interaction", it is the approximate number of bins per numeric feature before forming all combinations.

Returns

Factor with strata as levels.

Examples

y_multi <- data.frame( A = rep(c(letters[1:4]), each = 20), B = factor(sample(c(0, 1), 80, replace = TRUE)), c = rnorm(80) ) y <- multi_strata(y_multi, k = 3) folds <- create_folds(y, k = 5)

See Also

partition(), create_folds()

  • Maintainer: Michael Mayer
  • License: GPL (>= 2)
  • Last published: 2023-06-06