Sparse.Feature function

Nucleotide sequence encoding with 0 and 1.

Nucleotide sequence encoding with 0 and 1.

In this encoding approach A, T, G and C are encoded as (1,1,1), (1,0,0), (0,1,0) and (0,0,1). This was introduced by Golam Bari et al. (2014). Besides, each nucleotide can also be encoded with four bits i.e., A as (1,0,0,0), T as (0,1,0,0), G as (0,0,1,0) and C as (0,0,0,1) as followed in Meher et al. (2016).

Sparse.Feature(test_seq)

Arguments

  • test_seq: Sequence dataset to be encoded into numeric vector containing 0 and 1, must be an object of class DNAStringSet.

Details

Each sequence is encoded independently, without the need of positive and negative classes datasets.

Returns

A vector of length 4n4*n for sequence of nn nucleotides long in test_seq.

References

  1. Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.
  2. Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). A computational approach for prediction of donor splice sites with improved accuracy. Journal of Theoretical Biology, 404: 285-294.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

Note

For larger sequence length, high dimensional feature vector will be generated.

Examples

data(droso) test <- droso$test tst <- test enc <- Sparse.Feature(test_seq=tst) enc
  • Maintainer: Prabina Kumar Meher
  • License: GPL (>= 2)
  • Last published: 2019-05-28

Useful links