A Fast and Flexible Pipeline for Text Classification
Transform New Text into a Document-Feature Matrix
Train a Bag-of-Words Model
Train a Regularized Logistic Regression Model using glmnet
Run a Full Text Classification Pipeline on Preprocessed Text
Preprocess a Vector of Text Documents
Predict Sentiment on New Data Using a Saved Pipeline Artifact
functions/random_forest_fast.R Train a Random Forest Model using Range...
Train a Gradient Boosting Model using XGBoost
A high-level wrapper that simplifies text classification into three streamlined steps: preprocessing, model training, and prediction. It unifies the interface for multiple algorithms (including 'glmnet', 'ranger', and 'xgboost') and vectorization methods (Bag-of-Words, Term Frequency-Inverse Document Frequency (TF-IDF)), allowing users to go from raw text to a trained sentiment model in two function calls. The resulting model artifact automatically handles preprocessing for new datasets in the third step, ensuring consistent prediction pipelines.