Maldoss.Feature function

Encoding of nucleic acid sequences using di-nucleotide frequency difference between positive and negative class datasets.

Encoding of nucleic acid sequences using di-nucleotide frequency difference between positive and negative class datasets.

In Maldoss (Meher et al., 2016), the authors propose three encoding approaches namely P1, P2 and P3. Out of these three encoding schemes, the accuracies were reported to be higher for P1 as compared to the other two encoding procedures. Here, we describe the sequence encodng based on P1 only. This P1 encoding approach has similarity with that of PN-FDTF encoding (Huang et al., 2006) approach. The difference is only with respect to the logarithmic transformation in case of Maldoss.Feature. In this encoding procedure, both positive and negative class sequences are required for transformation of nucleotide sequences into numeric vectors.

Maldoss.Feature(positive_class, negative_class, test_seq)

Arguments

  • positive_class: Sequence dataset of the positive class, must be an object of class DNAStringSet.
  • negative_class: Sequence dataset of the negative class, must be an object of class DNAStringSet.
  • test_seq: Sequences to be encoded into numeric vectors, must be an object of class DNAStringSet.

Details

For getting an object of class DNAStringSet, the FASTA sequence dataset must be read in R through the function raedDNAStringSet available in Biostrings package of Bioconductor (https://bioconductor.org/packages/release/bioc/html/Biostrings.html ).

Returns

A numeric matrix of order m(n1)m*(n-1), where mm is the number of sequences in test_seq and nn is the length of sequence.

References

  1. Meher, P.K., Sahu, T.K. and Rao, A.R. (2016). Prediction of donor splice sites using random forest with a new sequence encoding approach. BioData Mining, 9.
  2. Huang, J., Li, T., Chen, K. and Wu, J. (2006). An approach of encoding for prediction of splice sites using SVM. Biochimie, 88(7): 923-929.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

See Also

PN.Fdtf.Feature, MM1.Feature, WAM.Feature

Examples

data(droso) positive <- droso$positive negative <- droso$negative test <- droso$test pos <- positive[1:200] neg <- negative[1:200] tst <- test enc <- Maldoss.Feature(positive_class=pos, negative_class=neg, test_seq=tst) enc
  • Maintainer: Prabina Kumar Meher
  • License: GPL (>= 2)
  • Last published: 2019-05-28

Useful links