Trint.Dist.Feature function

Tri-nucleotide distribution-based encoding of nucleotide sequences.

Tri-nucleotide distribution-based encoding of nucleotide sequences.

This encoding scheme was first time adopted by Wei et al. (2013) for prediction of splice sites along with MM1 features. In this encoding technique, distribution of trinucleotides are taken into consideration independently for the exon and intron regions of splice site motifs.

Trint.Dist.Feature(test_seq)

Arguments

  • test_seq: Sequence dataset to be transformed into numeric feature vectors. There should be atleat two sequences, must be an object of class DNAStringSet.

Details

This encoding scheme is independent of positive and negative datasets. In other words, each sequence can be encoded independently. Further, nucleotide sequence of any length will be transformed into a numeric vector of 64 observations corresponding to 64 combinations of trinucleotides.

Returns

A numeric matrix of order m64m*64, where mm is the number of sequences in test_seq.

References

Wei, D., Zhang, H., Wei, Y. and Jiang, Q. (2013). A novel splice site prediction method using support vector machine. J Comput Inform Syst., 920: 8053-8060.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

Examples

data(droso) test <- droso$test tst <- test enc <- Trint.Dist.Feature(test_seq=tst) enc
  • Maintainer: Prabina Kumar Meher
  • License: GPL (>= 2)
  • Last published: 2019-05-28

Useful links