This function calculates the full (log) probability or odds of a sequence given a hidden Markov model or profile HMM using the forward dynamic programming algorithm.
forward(x, y,...)## S3 method for class 'PHMM'forward( x, y, qe =NULL, logspace ="autodetect", odds =TRUE, windowspace ="all", DI =FALSE, ID =FALSE, cpp =TRUE,...)## S3 method for class 'HMM'forward(x, y, logspace ="autodetect", cpp =TRUE,...)
Arguments
x: an object of class PHMM or HMM.
y: a vector of mode "character" or "raw" (a "DNAbin" or "AAbin" object) representing a single sequence hypothetically emitted by the model in x.
...: additional arguments to be passed between methods.
qe: an optional named vector of background residue frequencies (only applicable if x is a PHMM). If qe = NULL the function looks for a qe vector as an attribute of the PHMM. If these are not available equal background residue frequencies are assumed.
logspace: logical indicating whether the emission and transition probabilities of x are logged. If logspace = "autodetect"
(default setting), the function will automatically detect if the probabilities are logged, returning an error if inconsistencies are found. Note that choosing the latter option increases the computational overhead; therefore specifying TRUE or FALSE can reduce the running time.
odds: logical, indicates whether the returned scores should be odds ratios (TRUE) or full logged probabilities (FALSE).
windowspace: a two-element integer vector providing the search space for dynamic programming (see Wilbur & Lipman 1983 for details). The first element should be negative, and represent the lowermost diagonal of the dynammic programming array, and the second element should be positive, representing the leftmost diagonal. Alternatively, if the the character string "all" is passed (the default setting) the entire dynamic programming array will be computed.
DI: logical indicating whether delete-insert transitions should be allowed in the profile hidden Markov model (if applicable). Defaults to FALSE.
ID: logical indicating whether insert-delete transitions should be allowed in the profile hidden Markov model (if applicable). Defaults to FALSE.
cpp: logical, indicates whether the dynamic programming matrix should be filled using compiled C++ functions (default; many times faster). The FALSE option is primarily retained for bug fixing and experimentation.
Returns
an object of class "DPA", which is a list containing the score and dynamic programming array.
Details
This function is a wrapper for a compiled C++ function that recursively fills a dynamic programming matrix with logged probabilities, and calculates the full (logged) probability of a sequence given a HMM or PHMM.
For a thorough explanation of the backward, forward and Viterbi algorithms, see Durbin et al (1998) chapters 3.2 (HMMs) and 5.4 (PHMMs).
Examples
## Forward algorithm for standard HMMs:## The dishonest casino example from Durbin et al (1998) chapter 3.2 states <- c("Begin","Fair","Loaded") residues <- paste(1:6)### Define the transition probability matrix A <- matrix(c(0,0,0,0.99,0.95,0.1,0.01,0.05,0.9), nrow =3) dimnames(A)<- list(from = states, to = states)### Define the emission probability matrix E <- matrix(c(rep(1/6,6), rep(1/10,5),1/2), nrow =2, byrow =TRUE) dimnames(E)<- list(states = states[-1], residues = residues)### Build and plot the HMM object x <- structure(list(A = A, E = E), class ="HMM") plot(x, main ="Dishonest casino HMM")### Find full probability of the sequence given the model data(casino) forward(x, casino)##### Forward algorithm for profile HMMs:## Small globin alignment data from Durbin et al (1998) Figure 5.3 data(globins)### Derive a profile HMM from the alignment globins.PHMM <- derivePHMM(globins, residues ="AMINO", seqweights =NULL) plot(globins.PHMM, main ="Profile hidden Markov model for globins")### Simulate a random sequence from the model suppressWarnings(RNGversion("3.5.0")) set.seed(999) simulation <- generate(globins.PHMM, size =20) simulation ## "F" "S" "A" "N" "N" "D" "W" "E"### Calculate the full (log) probability of the sequence given the model x <- forward(globins.PHMM, simulation, odds =FALSE) x # -23.0586### Show the dynammic programming array x$array
References
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, United Kingdom.
Wilbur WJ, Lipman DJ (1983) Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci USA, 80 , 726-730.