pred_first_fit function

First-Fit Predicitve Model with Group Lasso

First-Fit Predicitve Model with Group Lasso

This function implements the first-fit procedure described in Bradley and Cannings, 2021. It requires at least a generative model and a dataframe containing gene lengths as input.

pred_first_fit( gen_model, lambda = exp(seq(-16, -24, length.out = 100)), biomarker = "TMB", marker_mut_types = c("NS", "I"), training_matrix, gene_lengths, marker_training_values = NULL, K_method = max, free_genes = c() )

Arguments

  • gen_model: (list) A generative mutation model, fitted by fit_gen_model().
  • lambda: (numeric) A vector of penalisation weights for input to the group lasso optimiser gglasso.
  • biomarker: (character) The biomarker in question. If "TMB" or "TIB", then automatically defines the subsequent variable marker_mut_types.
  • marker_mut_types: (character) The set of mutation type groupings constituting the biomarker being estimated. Should be a vector comprising of elements of the mut_types_list vector in the 'names' attribute of gen_model.
  • training_matrix: (sparse matrix) A sparse matrix of mutations in the training dataset, produced by get_mutation_tables().
  • gene_lengths: (dataframe) A table with two columns: Hugo_Symbol and max_cds, providing the lengths of the genes to be modelled.
  • marker_training_values: (dataframe) A dataframe containing two columns: 'Tumor_Sample_Barcode', containing the sample IDs for the training dataset, and a second column containing training values for the biomarker in question.
  • K_method: (function) How to select a representative biomarker value from the training dataset. Defaults to max().
  • free_genes: (character) Which genes should escape penalisation (for example when augmenting a pre-existing panel).

Returns

A list of six elements:

  • fit: Output of call to gglasso.
  • panel_genes: A matrix where each row corresponds to a gene, each column to an iteration of the group lasso with a different penalty factor, and the elements booleans specifying whether that gene was selected to be included in that iteration.
  • panel_lengths: A vector giving total panel length for each gglasso iteration.
  • p: The vector of weights used in the optimisation procedure.
  • K: The bias penalty factor used in the optimisation procedure.
  • names: Gene and mutation type information as used when fitting the generative model.

Examples

example_first_fit <- pred_first_fit(example_gen_model, lambda = exp(seq(-9, -14, length.out = 100)), training_matrix = example_tables$train$matrix, gene_lengths = example_maf_data$gene_lengths)
  • Maintainer: Jacob R. Bradley
  • License: MIT + file LICENSE
  • Last published: 2021-11-15

Useful links