h2o.splitFrame function

Split an H2O Data Set

Split an H2O Data Set

Split an existing H2O data set according to user-specified ratios. The number of subsets is always 1 more than the number of given ratios. Note that this does not give an exact split. H2O is designed to be efficient on big data using a probabilistic splitting method rather than an exact split. For example, when specifying a split of 0.75/0.25, H2O will produce a test/train split with an expected value of 0.75/0.25 rather than exactly 0.75/0.25. On small datasets, the sizes of the resulting splits will deviate from the expected value more than on big data, where they will be very close to exact.

h2o.splitFrame(data, ratios = 0.75, destination_frames, seed = -1)


  • data: An H2OFrame object, to be split.
  • ratios: A numeric value or array indicating the ratio of total rows contained in each split. Must total up to less than 1. e.g. c(0.8) for 80/20 split.
  • destination_frames: An array of frame IDs equal to the number of values specified in the ratios array, plus one.
  • seed: Random seed.


Returns a list of split H2OFrames


## Not run: library(h2o) h2o.init() iris_hf <- as.h2o(iris) iris_split <- h2o.splitFrame(iris_hf, ratios = c(0.2, 0.5)) head(iris_split[[1]]) summary(iris_split[[1]]) ## End(Not run)
  • Maintainer: Tomas Fryda
  • License: Apache License (== 2.0)
  • Last published: 2024-01-11