This function implements the first-fit procedure described in Bradley and Cannings, 2021. It requires at least a generative model and a dataframe containing gene lengths as input.
gen_model: (list) A generative mutation model, fitted by fit_gen_model().
lambda: (numeric) A vector of penalisation weights for input to the group lasso optimiser gglasso.
biomarker: (character) The biomarker in question. If "TMB" or "TIB", then automatically defines the subsequent variable marker_mut_types.
marker_mut_types: (character) The set of mutation type groupings constituting the biomarker being estimated. Should be a vector comprising of elements of the mut_types_list vector in the 'names' attribute of gen_model.
training_matrix: (sparse matrix) A sparse matrix of mutations in the training dataset, produced by get_mutation_tables().
gene_lengths: (dataframe) A table with two columns: Hugo_Symbol and max_cds, providing the lengths of the genes to be modelled.
marker_training_values: (dataframe) A dataframe containing two columns: 'Tumor_Sample_Barcode', containing the sample IDs for the training dataset, and a second column containing training values for the biomarker in question.
K_method: (function) How to select a representative biomarker value from the training dataset. Defaults to max().
free_genes: (character) Which genes should escape penalisation (for example when augmenting a pre-existing panel).
Returns
A list of six elements:
fit: Output of call to gglasso.
panel_genes: A matrix where each row corresponds to a gene, each column to an iteration of the group lasso with a different penalty factor, and the elements booleans specifying whether that gene was selected to be included in that iteration.
panel_lengths: A vector giving total panel length for each gglasso iteration.
p: The vector of weights used in the optimisation procedure.
K: The bias penalty factor used in the optimisation procedure.
names: Gene and mutation type information as used when fitting the generative model.