h2o.infogram_train_subset_models function

Train models over subsets selected using infogram

Train models over subsets selected using infogram

h2o.infogram_train_subset_models( ig, model_fun, training_frame, test_frame, y, protected_columns, reference, favorable_class, feature_selection_metrics = c("safety_index"), metric = "euclidean", air_metric = "selectedRatio", alpha = 0.05, ... )

Arguments

  • ig: Infogram object trained with the same protected columns
  • model_fun: Function that creates models. This can be something like h2o.automl, h2o.gbm, etc.
  • training_frame: Training frame
  • test_frame: Test frame
  • y: Response column
  • protected_columns: Protected columns
  • reference: List of values corresponding to a reference for each protected columns. If set to NULL, it will use the biggest group as the reference.
  • favorable_class: Positive/favorable outcome class of the response.
  • feature_selection_metrics: One or more columns from the infogram@admissible_score.
  • metric: Metric supported by stats::dist which is used to sort the features.
  • air_metric: Metric used for Adverse Impact Ratio calculation. Defaults to selectedRatio.
  • alpha: The alpha level is the probability of rejecting the null hypothesis that the protected group and the reference came from the same population when the null hypothesis is true.
  • ...: Parameters that are passed to the model_fun.

Returns

frame containing aggregations of intersectional fairness across the models

Examples

## Not run: library(h2o) h2o.connect() data <- h2o.importFile(paste0("https://s3.amazonaws.com/h2o-public-test-data/smalldata/", "admissibleml_test/taiwan_credit_card_uci.csv")) x <- c('LIMIT_BAL', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6') y <- "default payment next month" protected_columns <- c('SEX', 'EDUCATION') for (col in c(y, protected_columns)) data[[col]] <- as.factor(data[[col]]) splits <- h2o.splitFrame(data, 0.8) train <- splits[[1]] test <- splits[[2]] reference <- c(SEX = "1", EDUCATION = "2") # university educated man favorable_class <- "0" # no default next month ig <- h2o.infogram(x, y, train, protected_columns = protected_columns) print(ig@admissible_score) plot(ig) infogram_models <- h2o.infogram_train_subset_models(ig, h2o.gbm, train, test, y, protected_columns, reference, favorable_class) pf <- h2o.pareto_front(infogram_models, x_metric = "air_min", y_metric = "AUC", optimum = "top right") plot(pf) pf@pareto_front ## End(Not run)
  • Maintainer: Tomas Fryda
  • License: Apache License (== 2.0)
  • Last published: 2024-01-11

Downloads (last 30 days):