fit_gen_model function

Fit Generative Model

Fit Generative Model

A function to fit a generative model to a mutation dataset. At its heart, requires a gene_lengths dataframe (for examples of the correct format for this see the pre-loaded datasets example_maf_data$gene_lengths and ensembl_gene_lengths), and a mutation dataset. This is best supplied through the 'table' argument, and constructed via the function get_mutation_tables().

fit_gen_model( gene_lengths, matrix = NULL, sample_list = NULL, gene_list = NULL, mut_types_list = NULL, col_names = NULL, table = NULL, nlambda = 100, n_folds = 10, maxit = 1e+09, seed_id = 1234, progress = FALSE, alt_model_type = NULL )

Arguments

  • gene_lengths: (dataframe) A table with two columns: Hugo_Symbol and max_cds, providing the lengths of the genes to be modelled.
  • matrix: (Matrix::sparseMatrix) A mutation matrix, such as produced by the function get_table_from_maf().
  • sample_list: (character) The set of samples to be modelled.
  • gene_list: (character) The set of genes to be modelled.
  • mut_types_list: (character) The set of mutation types to be modelled.
  • col_names: (character) The column names of the 'matrix' parameter.
  • table: (list) Optional parameter combining matrix, sample_list, gene_list, mut_types_list, col_names, as is produced by the function get_tables().
  • nlambda: (numeric) The length of the vector of penalty weights, passed to the function glmnet::glmnet().
  • n_folds: (numeric) The number of cross-validation folds to employ.
  • maxit: (numeric) Technical parameter passed to the function glmnet::glmnet().
  • seed_id: (numeric) Input value for the function set.seed().
  • progress: (logical) Show progress bars and text.
  • alt_model_type: (character) Used to call an alternative generative model type such as "US" (no sample-dependent parameters) or "UI" (no gene/variant-type interactions).

Returns

A list comprising three objects:

  • An object 'fit', a fitted glmnet model.
  • A table 'dev', giving average deviances for each regularisation penalty factor and cross-validation fold.
  • An integer 's_min', the index of the regularsisation penalty minimising cross-validation deviance.
  • A list 'names', containing the sample, gene, and mutation type information of the training data.

Examples

example_gen_model <- fit_gen_model(example_maf_data$gene_lengths, table = example_tables$train) print(names(example_gen_model))
  • Maintainer: Jacob R. Bradley
  • License: MIT + file LICENSE
  • Last published: 2021-11-15

Useful links