A function to, given a mutation annotation dataset with columns for sample barcode, gene name and mutation type, to reformulate this as a mutation matrix, with rows denoting samples, columns denoting gene/mutation type combinations, and the individual entries giving the number of mutations observed. This will likely be very sparse, so we save it as a sparse matrix for efficiency.
maf: (dataframe) A table of annotated mutations containing the columns 'Tumor_Sample_Barcode', 'Hugo_Symbol', and 'Variant_Classification'.
sample_list: (character) Optional parameter specifying the set of samples to include in the mutation matrix.
gene_list: (character) Optional parameter specifying the set of genes to include in the mutation matrix.
acceptable_genes: (character) Optional parameter specifying a set of acceptable genes, for example those which are in an ensembl databse.
for_biomarker: (character) Used for defining a dictionary of mutations. See the function get_mutation_dictionary() for details.
include_synonymous: (logical) Optional parameter specifying whether to include synonymous mutations in the mutation matrix.
dictionary: (character) Optional parameter directly specifying the mutation dictionary to use. See the function get_mutation_dictionary() for details.
Returns
A list with the following entries:
matrix: A mutation matrix, a sparse matrix showing the number of mutations present in each sample, gene and mutation type.
sample_list: A vector of characters specifying the samples included in the matrix: the rows of the mutation matrix correspond to each of these.
gene_list: A vector of characters specifying the the genes included in the matrix.
mut_types_list: A vector of characters specifying the mutation types (as grouped into an appropriate dictionary) to be included in the matrix.
col_names: A vector of characters identifying the columns of the mutation matrix. Each entry will be comprised of two parts separated by the character '_', the first identifying the gene in question and the second identifying the mutation type. E.g. 'GENE1_NS" where 'GENE1' is an element of gene_list, and 'NS' is an element of the dictionary vector.
Examples
# We use the preloaded maf file example_maf_data# Now we make a mutation matrixtable <- get_table_from_maf(example_maf_data$maf, sample_list = paste0("SAMPLE_",1:100))print(names(table))print(table$matrix[1:10,1:10])print(table$col_names[1:10])