h2o.createFrame function

Data H2OFrame Creation in H2O

Data H2OFrame Creation in H2O

Creates a data frame in H2O with real-valued, categorical, integer, and binary columns specified by the user.

h2o.createFrame( rows = 10000, cols = 10, randomize = TRUE, value = 0, real_range = 100, categorical_fraction = 0.2, factors = 100, integer_fraction = 0.2, integer_range = 100, binary_fraction = 0.1, binary_ones_fraction = 0.02, time_fraction = 0, string_fraction = 0, missing_fraction = 0.01, response_factors = 2, has_response = FALSE, seed, seed_for_column_types )

Arguments

  • rows: The number of rows of data to generate.
  • cols: The number of columns of data to generate. Excludes the response column if has_response = TRUE.
  • randomize: A logical value indicating whether data values should be randomly generated. This must be TRUE if either categorical_fraction or integer_fraction is non-zero.
  • value: If randomize = FALSE, then all real-valued entries will be set to this value.
  • real_range: The range of randomly generated real values.
  • categorical_fraction: The fraction of total columns that are categorical.
  • factors: The number of (unique) factor levels in each categorical column.
  • integer_fraction: The fraction of total columns that are integer-valued.
  • integer_range: The range of randomly generated integer values.
  • binary_fraction: The fraction of total columns that are binary-valued.
  • binary_ones_fraction: The fraction of values in a binary column that are set to 1.
  • time_fraction: The fraction of randomly created date/time columns.
  • string_fraction: The fraction of randomly created string columns.
  • missing_fraction: The fraction of total entries in the data frame that are set to NA.
  • response_factors: If has_response = TRUE, then this is the number of factor levels in the response column.
  • has_response: A logical value indicating whether an additional response column should be pre-pended to the final H2O data frame. If set to TRUE, the total number of columns will be cols+1.
  • seed: A seed used to generate random values when randomize = TRUE.
  • seed_for_column_types: A seed used to generate random column types when randomize = TRUE.

Returns

Returns an H2OFrame object.

Examples

## Not run: library(h2o) h2o.init() hf <- h2o.createFrame(rows = 1000, cols = 100, categorical_fraction = 0.1, factors = 5, integer_fraction = 0.5, integer_range = 1, has_response = TRUE) head(hf) summary(hf) hf <- h2o.createFrame(rows = 100, cols = 10, randomize = FALSE, value = 5, categorical_fraction = 0, integer_fraction = 0) summary(hf) ## End(Not run)
  • Maintainer: Tomas Fryda
  • License: Apache License (== 2.0)
  • Last published: 2024-01-11