var_stability function

Variable stability

Variable stability

Uses variable importance across models trained and tested across outer CV folds to assess stability of variable importance. For glmnet, variable importance is measured as the absolute model coefficients, optionally scaled as a percentage. The frequency with which each variable is selected in outer folds as well as the final model is also returned which is helpful for sparse models or with filters to determine how often variables end up in the model in each fold. For glmnet, the direction of effect is taken directly from the sign of model coefficients. For caret models, direction of effect is not readily available, so as a substitute, the directionality of each predictor is determined by the function var_direction() using the sign of a t-test for binary classification or the sign of regression coefficient for continuous outcomes (not available for multiclass caret models). To better understand direction of effect of each predictor within the final model, we recommend using SHAP values - see the vignette "Explaining nestedcv models with Shapley values". See pred_train() for an example.

var_stability(x, ...) ## S3 method for class 'nestcv.glmnet' var_stability( x, ranks = FALSE, summary = TRUE, percent = TRUE, level = 1, sort = TRUE, ... ) ## S3 method for class 'nestcv.train' var_stability(x, ranks = FALSE, summary = TRUE, sort = TRUE, ...) ## S3 method for class 'repeatcv' var_stability(x, ...)

Arguments

  • x: a nestcv.glmnet or nestcv.train fitted object or a list of these, or a repeatcv object.
  • ...: Optional arguments for compatibility
  • ranks: Logical whether to rank variables by importance
  • summary: Logical whether to return summary statistics on variable importance. Ignored if ranks is TRUE.
  • percent: Logical for nestcv.glmnet objects only, whether to scale coefficients to percentage of the largest coefficient in each model
  • level: For multinomial nestcv.glmnet models only, either an integer specifying which level of outcome is being examined, or the level can be specified as a character value
  • sort: Logical whether to sort variables by mean importance

Returns

If ranks is FALSE and summary is TRUE, returns a dataframe containing mean, sd, sem of variable importance and frequency by which each variable is selected in outer folds. If summary is FALSE, a matrix of either variable importance or, if ranks = TRUE, rankings across the outer folds and the final model is returned, with variables in rows and folds in columns.

Details

Note that for caret models caret::varImp() may require the model package to be fully loaded in order to function. During the fitting process caret

often only loads the package by namespace.

See Also

cv_coef() cv_varImp() pred_train()

  • Maintainer: Myles Lewis
  • License: MIT + file LICENSE
  • Last published: 2025-03-10