ml_gKRLS function

Machine Learning with gKRLS

Machine Learning with gKRLS

This provides a number of functions to use gKRLS (and mgcv more generally) as part of machine learning algorithms. Integration into SuperLearner and DoubleML (and mlr3) is described below.

SL.mgcv(Y, X, newX, formula, family, obsWeights, bam = FALSE, ...) ## S3 method for class 'SL.mgcv' predict(object, newdata, allow_missing_levels = TRUE, ...) add_bam_to_mlr3()

Arguments

  • Y: This is not usually directly specified in SL.mgcv, see the examples below and documentation in SuperLearner for more details.
  • X: This is not usually directly specified in SL.mgcv, see the examples below and documentation in SuperLearner for more details.
  • newX: This is not usually directly specified in SL.mgcv, see the examples below and documentation in SuperLearner for more details.
  • formula: A formula used for gam or bam from mgcv. This must be specified, see the examples.
  • family: This is not usually directly specified in SL.mgcv, see the examples below and documentation in SuperLearner for more details.
  • obsWeights: This is not usually directly specified in SL.mgcv, see the examples below and documentation in SuperLearner for more details.
  • bam: A logical value for whether mgcv::bam should be used instead of mgcv::gam. Default is FALSE. For large datasets, this can dramatically improve estimation time. Wood et al. (2015) and mgcv provide details on bam.
  • ...: Additional arguments to mgcv::gam and mgcv::bam.
  • object: This is not usually directly specified in SL.mgcv, see the examples below and documentation in SuperLearner for more details.
  • newdata: This is not usually directly specified in SL.mgcv, see the examples below and documentation in SuperLearner for more details.
  • allow_missing_levels: A logical variable that indicates whether missing levels in factors are allowed for prediction. The default is TRUE.

Returns

All three of the returned functions are usually called for use in other functions, i.e. creating objects for use in SuperLearner or adding bam models to mlr3.

Details

Ensembles: SuperLearner integration is provided by SL.mgcv and the corresponding predict method. mgcv::bam can be enabled by using bam = TRUE. A formula without an outcome

must be explicitly provided.

Please note that it is often useful to load SuperLearner before gKRLS or mgcv to avoid functions including gam and s being masked from other packages.

Double Machine Learning : DoubleML integration is provided in two ways. First, one could load mlr3extralearners to access regr.gam and classif.gam.

Second, this package provides mgcv::bam integration directly with a slight adaption of the mlr3extralearner implementation (see ?LearnerClassifBam for more details). These can be either manually added to the list of mlr3 learners by calling add_bam_to_mlr3() or direct usage. Examples are provided below. For classif.bam and regr.bam, the formula argument is mandatory.

Examples

set.seed(789) N <- 100 x1 <- rnorm(N) x2 <- rbinom(N, size = 1, prob = .2) y <- x1^3 - 0.5 * x2 + rnorm(N, 0, 1) y <- y * 10 X <- cbind(x1, x2, x1 + x2 * 3) X <- cbind(X, "x3" = rexp(nrow(X))) if (requireNamespace("SuperLearner", quietly = TRUE)) { # Estimate Ensemble with SuperLearner require(SuperLearner) sl_m <- function(...) { SL.mgcv(formula = ~ x1 + x2 + x3, ...) } fit_SL <- SuperLearner::SuperLearner( Y = y, X = data.frame(X), SL.library = "sl_m" ) pred <- predict(fit_SL, newdata = data.frame(X)) } # Estimate Double/Debiased Machine Learning if (requireNamespace("DoubleML", quietly = TRUE)) { require(DoubleML) # Load the models; for testing *ONLY* have multiplier of 2 double_bam_1 <- LearnerRegrBam$new() double_bam_1$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS", xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2)) double_bam_2 <- LearnerClassifBam$new() double_bam_2$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS", xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2)) # Create data dml_data <- DoubleMLData$new( data = data.frame(X, y), x_cols = c("x1", "x3"), y_col = "y", d_cols = "x2" ) # Estimate effects treatment (works for other DoubleML methods) dml_est <- DoubleMLIRM$new( data = dml_data, n_folds = 2, ml_g = double_bam_1, ml_m = double_bam_2 )$fit() }

References

Wood, Simon N., Yannig Goude, and Simon Shaw. 2015. "Generalized Additive Models for Large Data Sets." Journal of the Royal Statistical Society: Series C (Applied Statistics) 64(1):139-155.

  • Maintainer: Max Goplerud
  • License: GPL (>= 2)
  • Last published: 2024-11-07