This function allows for i) separation of a mutation dataset into training, validation and testing components, and ii) conversion from annotated mutation format to sparse mutation matrices, as described in the function get_table_from_maf().
maf: (dataframe) A table of annotated mutations containing the columns 'Tumor_Sample_Barcode', 'Hugo_Symbol', and 'Variant_Classification'.
split: (double) A vector of three positive values with names 'train', 'val' and 'test'. Specifies the proportions into which to split the dataset.
sample_list: sample_list (character) Optional parameter specifying the set of samples to include in the mutation matrices.
gene_list: (character) Optional parameter specifying the set of genes to include in the mutation matrices.
acceptable_genes: (character) Optional parameter specifying a set of acceptable genes, for example those which are in an ensembl databse.
for_biomarker: (character) Used for defining a dictionary of mutations. See the function get_mutation_dictionary() for details.
include_synonymous: (logical) Optional parameter specifying whether to include synonymous mutations in the mutation matrices.
dictionary: (character) Optional parameter directly specifying the mutation dictionary to use. See the function get_mutation_dictionary() for details.
seed_id: (numeric) Input value for the function set.seed().
Returns
A list of three items with names 'train', 'val' and 'test'. Each element will contain a sparse mutation matrix for the samples in that branch, alongside other information as described as the output of the function get_table_from_maf().