Develop Text Prediction Models Based on N-Grams
Generates n-grams from text files
Generates transition probabilities for n-grams
wordpredictor: Develop Text Prediction Models Based on N-Grams
Generates data samples from text files
Allows managing the test environment
Represents n-gram models
Evaluates performance of n-gram models
Generates n-gram models from a text file
Allows predicting text, calculating word probabilities and Perplexity
Base class for all other classes
Analyzes input text files and n-gram token files
Provides data cleaning functionality
A framework for developing n-gram models for text prediction. It provides data cleaning, data sampling, extracting tokens from text, model generation, model evaluation and word prediction. For information on how n-gram models work we referred to: "Speech and Language Processing" <https://web.archive.org/web/20240919222934/https%3A%2F%2Fweb.stanford.edu%2F~jurafsky%2Fslp3%2F3.pdf>. For optimizing R code and using R6 classes we referred to "Advanced R" <https://adv-r.hadley.nz/r6.html>. For writing R extensions we referred to "R Packages", <https://r-pkgs.org/index.html>.
Useful links