x: A vector containing the character names of the predictors in the model.
model_id: Destination id for this model; auto-generated if not specified.
score_each_iteration: Logical. Whether to score during each iteration of model training. Defaults to FALSE.
score_tree_interval: Score the model after every so many trees. Disabled if set to 0. Defaults to 0.
ignore_const_cols: Logical. Ignore constant columns. Defaults to TRUE.
ntrees: Number of trees. Defaults to 50.
max_depth: Maximum tree depth (0 for unlimited). Defaults to 8.
min_rows: Fewest allowed (weighted) observations in a leaf. Defaults to 1.
max_runtime_secs: Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.
seed: Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default). Defaults to -1 (time-based random number).
build_tree_one_node: Logical. Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. Defaults to FALSE.
mtries: Number of variables randomly sampled as candidates at each split. If set to -1, defaults (number of predictors)/3. Defaults to -1.
sample_size: Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored. Defaults to 256.
sample_rate: Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to -1, sample_rate is disabled and sample_size will be used instead. Defaults to -1.
col_sample_rate_change_per_level: Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) Defaults to 1.
col_sample_rate_per_tree: Column sample rate per tree (from 0.0 to 1.0) Defaults to 1.
categorical_encoding: Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO.
stopping_rounds: Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0.
stopping_metric: Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Must be one of: "AUTO", "anomaly_score". Defaults to AUTO.
stopping_tolerance: Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) Defaults to 0.01.
export_checkpoints_dir: Automatically export generated models to this directory.
contamination: Contamination ratio - the proportion of anomalies in the input dataset. If undefined (-1) the predict function will not mark observations as anomalies and only anomaly score will be returned. Defaults to -1 (undefined). Defaults to -1.
validation_frame: Id of the validation data frame.
validation_response_column: (experimental) Name of the response column in the validation frame. Response column should be binary and indicate not anomaly/anomaly.
Examples
## Not run:library(h2o)h2o.init()# Import the cars datasetf <-"https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv"cars <- h2o.importFile(f)# Set the predictorspredictors <- c("displacement","power","weight","acceleration","year")# Train the IF modelcars_if <- h2o.isolationForest(x = predictors, training_frame = cars, seed =1234, stopping_metric ="anomaly_score", stopping_rounds =3, stopping_tolerance =0.1)## End(Not run)