h2o.train_segments() R function from [h2o]

H2O Segmented-Data Bulk Model Training

Provides a set of functions to train a group of models on different segments (subpopulations) of the training set.


h2o.train_segments(
  algorithm,
  segment_columns,
  segment_models_id,
  parallelism = 1,
  ...
)

Arguments

algorithm: Name of algorithm to use in training segment models (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, psvm, xgboost, pca, svd, targetencoder, aggregator, word2vec, coxph, isolationforest, kmeans, stackedensemble, glrm, gam, anovaglm, modelselection).
segment_columns: A list of columns to segment-by. H2O will group the training (and validation) dataset by the segment-by columns and train a separate model for each segment (group of rows).
segment_models_id: Identifier for the returned collection of Segment Models. If not specified it will be automatically generated.
parallelism: Level of parallelism of bulk model building, it is the maximum number of models each H2O node will be building in parallel, defaults to 1.
...: Use to pass along training_frame parameter, x, y, and all non-default parameter values to the algorithm Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters.

Details

Start Segmented-Data bulk Model Training for a given algorithm and parameters.

Examples


## Not run:

library(h2o)
h2o.init()
iris_hf <- as.h2o(iris)
models <- h2o.train_segments(algorithm = "gbm", 
                             segment_columns = "Species",
                             x = c(1:3), y = 4, 
                             training_frame = iris_hf,
                             ntrees = 5, 
                             max_depth = 4)
as.data.frame(models)
## End(Not run)

h2o package Read PDF manual

Maintainer: Tomas Fryda
License: Apache License (== 2.0)
Last published: 2024-01-11

Useful links

h2o.train_segments function

H2O Segmented-Data Bulk Model Training

Arguments

Details

Examples