User-Friendly R Package for Supervised Machine Learning Pipelines
Throw error if required packages are not installed.
Calculate a bootstrap confidence interval for the performance on a sin...
Get the lower and upper bounds for an empirical confidence interval
Calculate balanced precision given actual and baseline precision
Calculate the fraction of positives, i.e. baseline precision for a PRC...
Generic function to calculate mean performance curves for multiple mod...
Calculate performance for a single split from rsample::bootstraps()
Get performance metrics for test data
Calculate the p-value for a permutation test
Change columns to numeric if possible
Check all params that don't return a value
Check if any features are categorical
check that corr_thresh is either NULL or a number between 0 and 1
Check that the dataset is not empty and has more than 1 column.
Check features
Check the validity of the group_partitions list
Check grouping vector
Check that kfold is an integer of reasonable size
Check if the method is supported. If not, throws error.
Check ntree
Check that outcome column exists. Pick outcome column if not specified...
Check that the outcome variable is valid. Pick outcome value if necess...
Check whether package(s) are installed
Check perf_metric_function is NULL or a function
Check perf_metric_name is NULL or a function
Check that permute is a logical
Check remove_var
check that the seed is either NA or a number
Check that the training fraction is between 0 and 1
Check the validity of the training indices
Cluster a matrix of correlated features
Collapse correlated features
Combine hyperparameter performance metrics for multiple train/test spl...
Perform permutation tests to compare the performance metric across all...
Split into train and test set while splitting by groups. When `group_p...
Splitting into folds for cross-validation when using groups
Define cross-validation scheme and training parameters
Get permuted performance metric difference for a single feature (or gr...
Flatten correlation matrix to pairs
Identify correlated features as a binary matrix
Get dummyvars dataframe (i.e. design matrix)
Get preprocessed dataframe for continuous variables
Identify correlated features
Calculate the difference in the mean of the metric for two groups
Get feature importance using the permutation method
Assign features to groups
Get hyperparameter performance metrics
Split hyperparameters dataframe into named lists for each parameter
Set hyperparameters based on ML method and dataset characteristics
Get outcome type.
Select indices to partition the data into training & testing sets.
Get default performance metric function
Get default performance metric name
Get model performance metrics as a one-row tibble
Get seeds for caret::trainControl()
Generate the tuning grid for tuning hyperparameters
Group correlated features
Check whether a numeric vector contains whole numbers.
Whether groups can be kept together in partitions during cross-validat...
mikropml: User-Friendly R Package for Robust Machine Learning Pipeline...
Mutate all columns with utils::type.convert()
.`
Update progress if the progress bar is not NULL
.
Calculated a permuted p-value comparing two models
Plot ROC and PRC curves
Plot hyperparameter performance metrics
Plot performance metrics for multiple ML runs with different parameter...
Preprocess data prior to running machine learning
Process categorical features
Preprocess continuous features
Process features with no variation
Call sort()
with method = 'radix'
Randomize feature order to eliminate any position-dependent effects
caret contr.ltfr
Remove columns appearing in only threshold
row(s) or fewer.
Replace spaces in all elements of a character vector with underscores
Remove missing outcome values
Run the machine learning pipeline
Use future apply if available
Calculate and summarize performance for ROC and PRC plots
Set hyperparameters for regression models for use with glmnet
Set hyparameters for random forest models
Set hyperparameters for decision tree models
Set hyperparameters for SVM with radial kernel
Set hyperparameters for SVM with radial kernel
Get plot layers shared by plot_mean_roc
and plot_mean_prc
Shuffle the rows in a column
Split dataset into outcome and features
Tidy the performance dataframe
Train model using caret::train()
.
An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Useful links