Trint.Dist.Feature() R function from [EncDNA]

Tri-nucleotide distribution-based encoding of nucleotide sequences.

This encoding scheme was first time adopted by Wei et al. (2013) for prediction of splice sites along with MM1 features. In this encoding technique, distribution of trinucleotides are taken into consideration independently for the exon and intron regions of splice site motifs.


Trint.Dist.Feature(test_seq)

Arguments

test_seq: Sequence dataset to be transformed into numeric feature vectors. There should be atleat two sequences, must be an object of class DNAStringSet.

Details

This encoding scheme is independent of positive and negative datasets. In other words, each sequence can be encoded independently. Further, nucleotide sequence of any length will be transformed into a numeric vector of 64 observations corresponding to 64 combinations of trinucleotides.

Returns

A numeric matrix of order $m*64$ , where $m$ is the number of sequences in test_seq.

References

Wei, D., Zhang, H., Wei, Y. and Jiang, Q. (2013). A novel splice site prediction method using support vector machine. J Comput Inform Syst., 920: 8053-8060.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

Examples


data(droso)
test <- droso$test
tst <- test
enc <- Trint.Dist.Feature(test_seq=tst)
enc

EncDNA package Read PDF manual

Maintainer: Prabina Kumar Meher
License: GPL (>= 2)
Last published: 2019-05-28

Useful links

Trint.Dist.Feature function