Takes a string (e.g., a DNA sequence) of general form (e.g., FASTA) and converts it to a sequence of indicator vectors for use with the Spectral Envelope (specenv).
dna2vector(data, alphabet =NULL)
Arguments
data: A single string.
alphabet: The particular alphabet being used. The default is alphabet=c("A", "C", "G", "T").
Details
Takes a string of categories and converts it to a matrix of indicators. The data can then be used by the script specenv, which calculates the Spectral Envelope of the sequence (or subsequence). Many different type of sequences can be used, including FASTA and GenBank, as long as the data is a string of categories.
The indicator vectors (as a matrix) are returned invisibly in case the user forgets to put the results in an object wherein the screen would scroll displaying the entire sequence. In other words, the user should do something like xdata = dna2vector(data) where data is the original sequence.
As an example, if the DNA sequence is in a FASTA file, say sequence.fasta, remove the first line, which will look like >V01555.2 .... Then the following code can be used to read the data into the session, create the indicator sequence and save it as a compressed R data file:
fileName <- 'sequence.fasta' # name of FASTA file
data <- readChar(fileName, file.info(fileName)$size) # input the sequence
myseq <- dna2vector(data) # convert it to indicators
##== to compress and save the data ==##
save(myseq, file='myseq.rda')
##== and then load it when needed ==##
load('myseq.rda')
Returns
matrix of indicator vectors; returned invisibly
References
You can find demonstrations of astsa capabilities at FUN WITH ASTSA.