Nucleotide sequence encoding with the distribution of trinucleotides.
Nucleotide sequence encoding with the distribution of trinucleotides.
Each nucleotide sequence is encoded into a numeric vector of same length based on the distribution of nucleotides over the sequence. Here, two classes of dataset are not required for encoding, and each sequence is independently encoded instead. This encoding seheme was introduced by Wei et al. (2013) for prediction of donor and acceptor human splice sites along with the MM1.Feature.
Density.Feature(test_seq)
Arguments
test_seq: Sequence dataset to be encoded, must be an object of class DNAStringSet.
Details
The class DNAStringSet can be obtained by reading FASTA sequences using the function readDNAStringSet avialble in Biostrings package of Bioconductor.
Returns
A numeric matrix of order m∗n, where m is the number of sequences in test_seq and n is the length of sequence.
References
Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.
Author(s)
Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA