h2o.train_segments function

H2O Segmented-Data Bulk Model Training

H2O Segmented-Data Bulk Model Training

Provides a set of functions to train a group of models on different segments (subpopulations) of the training set.

h2o.train_segments( algorithm, segment_columns, segment_models_id, parallelism = 1, ... )

Arguments

  • algorithm: Name of algorithm to use in training segment models (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, psvm, xgboost, pca, svd, targetencoder, aggregator, word2vec, coxph, isolationforest, kmeans, stackedensemble, glrm, gam, anovaglm, modelselection).
  • segment_columns: A list of columns to segment-by. H2O will group the training (and validation) dataset by the segment-by columns and train a separate model for each segment (group of rows).
  • segment_models_id: Identifier for the returned collection of Segment Models. If not specified it will be automatically generated.
  • parallelism: Level of parallelism of bulk model building, it is the maximum number of models each H2O node will be building in parallel, defaults to 1.
  • ...: Use to pass along training_frame parameter, x, y, and all non-default parameter values to the algorithm Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters.

Details

Start Segmented-Data bulk Model Training for a given algorithm and parameters.

Examples

## Not run: library(h2o) h2o.init() iris_hf <- as.h2o(iris) models <- h2o.train_segments(algorithm = "gbm", segment_columns = "Species", x = c(1:3), y = 4, training_frame = iris_hf, ntrees = 5, max_depth = 4) as.data.frame(models) ## End(Not run)
  • Maintainer: Tomas Fryda
  • License: Apache License (== 2.0)
  • Last published: 2024-01-11