cv.isb.splsdacox function

Iterative SB.sPLS-DACOX-Dynamic Cross-Validation

Iterative SB.sPLS-DACOX-Dynamic Cross-Validation

This function performs cross-validated sparse partial least squares single-block for sPLS-DACOX-Dynamic. It returns the optimal number of components and the optimal sparsity penalty value based on cross-validation. Performance can be evaluated using multiple metrics, such as Area Under the Curve (AUC), I. Brier Score, or C-Index. Users can also specify more than one metric simultaneously.

cv.isb.splsdacox( X, Y, max.ncomp = 8, vector = NULL, MIN_NVAR = 10, MAX_NVAR = NULL, n.cut_points = 5, MIN_AUC_INCREASE = 0.01, EVAL_METHOD = "AUC", n_run = 3, k_folds = 10, x.center = TRUE, x.scale = FALSE, remove_near_zero_variance = TRUE, remove_zero_variance = TRUE, toKeep.zv = NULL, remove_variance_at_fold_level = FALSE, remove_non_significant_models = FALSE, remove_non_significant = FALSE, alpha = 0.05, w_AIC = 0, w_C.Index = 0, w_AUC = 1, w_I.BRIER = 0, times = NULL, max_time_points = 15, MIN_AUC = 0.8, MIN_COMP_TO_CHECK = 3, pred.attr = "mean", pred.method = "cenROC", fast_mode = FALSE, max.iter = 200, MIN_EPV = 5, return_models = FALSE, returnData = FALSE, PARALLEL = FALSE, verbose = FALSE, seed = 123 )

Arguments

  • X: List of numeric matrices or data.frames. Explanatory variables. Qualitative variables must be transformed into binary variables.
  • Y: Numeric matrix or data.frame. Response variables. Must contain two columns: "time" and "event". For the event column, accepted values are 0/1 or FALSE/TRUE for censored and event observations.
  • max.ncomp: Numeric. Maximum number of PLS components to compute during cross-validation (default: 8).
  • vector: Numeric vector. A vector indicating the number of variables to select for each block and component (default: NULL).
  • MIN_NVAR: Numeric. Minimum number of variables to select in the model (default: 10).
  • MAX_NVAR: Numeric. Maximum number of variables to select in the model (default: NULL).
  • n.cut_points: Numeric. Number of cut points to evaluate the number of variables (default: 5).
  • MIN_AUC_INCREASE: Numeric. Minimum improvement in AUC required between models to continue evaluation (default: 0.01).
  • EVAL_METHOD: Character. Method for evaluating performance. Must be one of "AUC", "C-Index", etc. (default: "AUC").
  • n_run: Numeric. Number of runs for cross-validation (default: 3).
  • k_folds: Numeric. Number of folds for cross-validation (default: 10).
  • x.center: Logical. If TRUE, the X matrix is centered to zero means (default: TRUE).
  • x.scale: Logical. If TRUE, the X matrix is scaled to unit variances (default: FALSE).
  • remove_near_zero_variance: Logical. If TRUE, near-zero variance variables are removed (default: TRUE).
  • remove_zero_variance: Logical. If TRUE, zero-variance variables are removed (default: TRUE).
  • toKeep.zv: Character vector. Names of variables in X to retain despite variance filtering (default: NULL).
  • remove_variance_at_fold_level: Logical. If TRUE, variance filtering is applied at the fold level (default: FALSE).
  • remove_non_significant_models: Logical. If TRUE, models with non-significant components are removed before evaluation (default: FALSE).
  • remove_non_significant: Logical. If TRUE, non-significant components in the final Cox model are removed (default: FALSE).
  • alpha: Numeric. Significance threshold for selecting variables/components (default: 0.05).
  • w_AIC: Numeric. Weight for AIC in the evaluation. All weights must sum to 1 (default: 0).
  • w_C.Index: Numeric. Weight for C-Index in the evaluation. All weights must sum to 1 (default: 0).
  • w_AUC: Numeric. Weight for AUC in the evaluation. All weights must sum to 1 (default: 1).
  • w_I.BRIER: Numeric. Weight for Integrative Brier Score in the evaluation. All weights must sum to 1 (default: 0).
  • times: Numeric vector. Time points for AUC evaluation (default: NULL).
  • max_time_points: Numeric. Maximum number of time points for AUC evaluation (default: 15).
  • MIN_AUC: Numeric. Minimum AUC to achieve during cross-validation (default: 0.8).
  • MIN_COMP_TO_CHECK: Numeric. Number of components to evaluate before stopping if no improvement is observed (default: 3).
  • pred.attr: Character. Method for evaluating performance. Must be one of "mean" or "median" (default: "mean").
  • pred.method: Character. AUC evaluation method. Must be one of: "risksetROC", "survivalROC", "cenROC", etc. (default: "cenROC").
  • fast_mode: Logical. If TRUE, only one fold is evaluated per run; otherwise, all folds are evaluated simultaneously (default: FALSE).
  • max.iter: Numeric. Maximum number of iterations for convergence (default: 200).
  • MIN_EPV: Numeric. Minimum number of Events Per Variable for the final Cox model (default: 5).
  • return_models: Logical. If TRUE, returns all models computed during cross-validation (default: FALSE).
  • returnData: Logical. If TRUE, returns original and normalized X and Y matrices (default: FALSE).
  • PARALLEL: Logical. If TRUE, runs cross-validation in parallel using multiple cores (default: FALSE).
  • verbose: Logical. If TRUE, extra messages are displayed during execution (default: FALSE).
  • seed: Numeric. Seed for reproducibility (default: 123).

Returns

An instance of class "Coxmos" and model "cv.SB.sPLS-DACOX-Dynamic", containing:

  • best_model_info: Data frame with the best model's information.
  • df_results_folds: Data frame with fold-level results.
  • df_results_runs: Data frame with run-level results.
  • df_results_comps: Data frame with component-level results.
  • list_cv_spls_models: List of cross-validated models for each block.
  • opt.comp: Optimal number of components.
  • opt.nvar: Optimal number of variables selected.
  • class: Model class.
  • time: Time taken to run the cross-validation.

Details

The cv.isb.splsdacox_dynamic function performs cross-validation for the single-block sparse partial least squares deviance residual Cox analysis (sPLS-DACOX). Cross-validation evaluates different hyperparameter combinations, including the number of components (max.ncomp) and the number of variables selected (vector). The function systematically evaluates models across multiple runs and folds to determine the best configuration. It allows flexibility in metrics, preprocessing steps (centering, scaling, variance filtering), and stopping criteria.

For each run, the dataset is divided into training and test sets for the specified number of folds (k_folds). Various metrics, such as AIC, C-Index, I. Brier Score, and AUC, are computed to assess model performance. The function identifies the optimal hyperparameters that yield the best performance based on the selected evaluation metrics.

Additionally, it offers options to control the evaluation algorithm method (pred.method), whether to return all models, and parallel processing (PARALLEL). The function also allows the user to control the verbosity of output messages and set the minimum threshold for Events Per Variable (MIN_EPV).

Examples

data("X_multiomic") data("Y_multiomic") set.seed(123) index_train <- caret::createDataPartition(Y_multiomic$event, p = .25, list = FALSE, times = 1) X_train <- X_multiomic X_train$mirna <- X_train$mirna[index_train,1:20] X_train$proteomic <- X_train$proteomic[index_train,1:20] Y_train <- Y_multiomic[index_train,] vector <- list() vector$mirna <- c(10) vector$proteomic <- c(10) cv.isb.splsdacox_model <- cv.isb.splsdacox(X_train, Y_train, max.ncomp = 1, vector = vector, n_run = 1, k_folds = 3, x.center = TRUE, x.scale = TRUE)

Author(s)

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

  • Maintainer: Pedro Salguero García
  • License: CC BY 4.0
  • Last published: 2025-03-05