Sparse.Feature() R function from [EncDNA]

Nucleotide sequence encoding with 0 and 1.

In this encoding approach A, T, G and C are encoded as (1,1,1), (1,0,0), (0,1,0) and (0,0,1). This was introduced by Golam Bari et al. (2014). Besides, each nucleotide can also be encoded with four bits i.e., A as (1,0,0,0), T as (0,1,0,0), G as (0,0,1,0) and C as (0,0,0,1) as followed in Meher et al. (2016).


Sparse.Feature(test_seq)

Arguments

test_seq: Sequence dataset to be encoded into numeric vector containing 0 and 1, must be an object of class DNAStringSet.

Details

Each sequence is encoded independently, without the need of positive and negative classes datasets.

Returns

A vector of length $4*n$ for sequence of $n$ nucleotides long in test_seq.

References

Bari, A.T.M.G., Reaz, M.R. and Jeong, B.S. (2014). Effective DNA encoding for splice site prediction using SVM. MATCH Commun. Math. Comput. Chem., 71: 241-258.
Meher, P.K., Sahu, T.K., Rao, A.R. and Wahi, S.D. (2016). A computational approach for prediction of donor splice sites with improved accuracy. Journal of Theoretical Biology, 404: 285-294.

Author(s)

Prabina Kumar Meher, Indian Agricultural Statistics Research Institute, New Delhi-110012, INDIA

Note

For larger sequence length, high dimensional feature vector will be generated.

Examples


data(droso)
test <- droso$test
tst <- test
enc <- Sparse.Feature(test_seq=tst)
enc

EncDNA package Read PDF manual

Maintainer: Prabina Kumar Meher
License: GPL (>= 2)
Last published: 2019-05-28

Useful links

Sparse.Feature function