get_mutation_tables function

Produce Training, Validation and Test Matrices

Produce Training, Validation and Test Matrices

This function allows for i) separation of a mutation dataset into training, validation and testing components, and ii) conversion from annotated mutation format to sparse mutation matrices, as described in the function get_table_from_maf().

get_mutation_tables( maf, split = c(train = 0.7, val = 0.15, test = 0.15), sample_list = NULL, gene_list = NULL, acceptable_genes = NULL, for_biomarker = "TIB", include_synonymous = TRUE, dictionary = NULL, seed_id = 1234 )

Arguments

  • maf: (dataframe) A table of annotated mutations containing the columns 'Tumor_Sample_Barcode', 'Hugo_Symbol', and 'Variant_Classification'.
  • split: (double) A vector of three positive values with names 'train', 'val' and 'test'. Specifies the proportions into which to split the dataset.
  • sample_list: sample_list (character) Optional parameter specifying the set of samples to include in the mutation matrices.
  • gene_list: (character) Optional parameter specifying the set of genes to include in the mutation matrices.
  • acceptable_genes: (character) Optional parameter specifying a set of acceptable genes, for example those which are in an ensembl databse.
  • for_biomarker: (character) Used for defining a dictionary of mutations. See the function get_mutation_dictionary() for details.
  • include_synonymous: (logical) Optional parameter specifying whether to include synonymous mutations in the mutation matrices.
  • dictionary: (character) Optional parameter directly specifying the mutation dictionary to use. See the function get_mutation_dictionary() for details.
  • seed_id: (numeric) Input value for the function set.seed().

Returns

A list of three items with names 'train', 'val' and 'test'. Each element will contain a sparse mutation matrix for the samples in that branch, alongside other information as described as the output of the function get_table_from_maf().

Examples

tables <- get_mutation_tables(example_maf_data$maf, sample_list = paste0("SAMPLE_", 1:100)) print(names(tables)) print(names(tables$train))
  • Maintainer: Jacob R. Bradley
  • License: MIT + file LICENSE
  • Last published: 2021-11-15

Useful links