A function to fit a generative model to a mutation dataset. At its heart, requires a gene_lengths dataframe (for examples of the correct format for this see the pre-loaded datasets example_maf_data$gene_lengths and ensembl_gene_lengths), and a mutation dataset. This is best supplied through the 'table' argument, and constructed via the function get_mutation_tables().
gene_lengths: (dataframe) A table with two columns: Hugo_Symbol and max_cds, providing the lengths of the genes to be modelled.
matrix: (Matrix::sparseMatrix) A mutation matrix, such as produced by the function get_table_from_maf().
sample_list: (character) The set of samples to be modelled.
gene_list: (character) The set of genes to be modelled.
mut_types_list: (character) The set of mutation types to be modelled.
col_names: (character) The column names of the 'matrix' parameter.
table: (list) Optional parameter combining matrix, sample_list, gene_list, mut_types_list, col_names, as is produced by the function get_tables().
nlambda: (numeric) The length of the vector of penalty weights, passed to the function glmnet::glmnet().
n_folds: (numeric) The number of cross-validation folds to employ.
maxit: (numeric) Technical parameter passed to the function glmnet::glmnet().
seed_id: (numeric) Input value for the function set.seed().
progress: (logical) Show progress bars and text.
alt_model_type: (character) Used to call an alternative generative model type such as "US" (no sample-dependent parameters) or "UI" (no gene/variant-type interactions).
Returns
A list comprising three objects:
An object 'fit', a fitted glmnet model.
A table 'dev', giving average deviances for each regularisation penalty factor and cross-validation fold.
An integer 's_min', the index of the regularsisation penalty minimising cross-validation deviance.
A list 'names', containing the sample, gene, and mutation type information of the training data.