N-Gram Analysis of Biological Sequences
Add 1-grams
Coerce feature_test object to a data frame
Binarize
biogram - analysis of biological sequences using n-grams
Calculate value of criterion
Calculate Chi-squared-based measure
Calculate encoding distance
Calculate IG for single feature
Calculate KL divergence of features
Calculate partition index
Compute similarity index
Check chosen criterion
Clustering of sequences based on regular expression
Code n-grams
Construct and filter n-grams
Detect and count multiple n-grams in sequences
Count n-grams in sequences
Count specified n-grams
Count total number of n-grams
Create encoding
Create feature according to given contingency matrix
Get all possible n-Grams
criterion_distribution class
Categorize tested features
Decode n-grams
Degenerate protein sequence
Degenerate n-grams
Compute criterion distribution
Convert encoding to data frame
2d cross-tabulation
feature_test class
Convert encoding from full to simple format
Gap n-grams
Generate sequence
Generate single region
Generate single unigram
Generate unigrams
Get indices of n-grams
Validate n-gram
Convert letters to numbers
Convert list of sequences to matrix
Convert numbers to letters
n-grams to data frame
Plot criterion distribution
Position n-grams
Print tested features
Read FASTA files
Regenerate n-grams
regional_param class
Extract n-grams from sequence
Convert encoding from simple to full format
Summarize tested features
Tabulate n-grams
Permutation test for feature selection
Validate encoding
Write encodings to a file
Write FASTA files
Tools for extraction and analysis of various n-grams (k-mers) derived from biological sequences (proteins or nucleic acids). Contains QuiPT (quick permutation test) for fast feature-filtering of the n-gram data.