get_table_from_maf function

Produce a Mutation Matrix from a MAF

Produce a Mutation Matrix from a MAF

A function to, given a mutation annotation dataset with columns for sample barcode, gene name and mutation type, to reformulate this as a mutation matrix, with rows denoting samples, columns denoting gene/mutation type combinations, and the individual entries giving the number of mutations observed. This will likely be very sparse, so we save it as a sparse matrix for efficiency.

get_table_from_maf( maf, sample_list = NULL, gene_list = NULL, acceptable_genes = NULL, for_biomarker = "TIB", include_synonymous = TRUE, dictionary = NULL )

Arguments

  • maf: (dataframe) A table of annotated mutations containing the columns 'Tumor_Sample_Barcode', 'Hugo_Symbol', and 'Variant_Classification'.
  • sample_list: (character) Optional parameter specifying the set of samples to include in the mutation matrix.
  • gene_list: (character) Optional parameter specifying the set of genes to include in the mutation matrix.
  • acceptable_genes: (character) Optional parameter specifying a set of acceptable genes, for example those which are in an ensembl databse.
  • for_biomarker: (character) Used for defining a dictionary of mutations. See the function get_mutation_dictionary() for details.
  • include_synonymous: (logical) Optional parameter specifying whether to include synonymous mutations in the mutation matrix.
  • dictionary: (character) Optional parameter directly specifying the mutation dictionary to use. See the function get_mutation_dictionary() for details.

Returns

A list with the following entries:

  • matrix: A mutation matrix, a sparse matrix showing the number of mutations present in each sample, gene and mutation type.
  • sample_list: A vector of characters specifying the samples included in the matrix: the rows of the mutation matrix correspond to each of these.
  • gene_list: A vector of characters specifying the the genes included in the matrix.
  • mut_types_list: A vector of characters specifying the mutation types (as grouped into an appropriate dictionary) to be included in the matrix.
  • col_names: A vector of characters identifying the columns of the mutation matrix. Each entry will be comprised of two parts separated by the character '_', the first identifying the gene in question and the second identifying the mutation type. E.g. 'GENE1_NS" where 'GENE1' is an element of gene_list, and 'NS' is an element of the dictionary vector.

Examples

# We use the preloaded maf file example_maf_data # Now we make a mutation matrix table <- get_table_from_maf(example_maf_data$maf, sample_list = paste0("SAMPLE_", 1:100)) print(names(table)) print(table$matrix[1:10,1:10]) print(table$col_names[1:10])
  • Maintainer: Jacob R. Bradley
  • License: MIT + file LICENSE
  • Last published: 2021-11-15

Useful links