mikropml1.6.1 package

User-Friendly R Package for Supervised Machine Learning Pipelines

abort_packages_not_installed

Throw error if required packages are not installed.

bootstrap_performance

Calculate a bootstrap confidence interval for the performance on a sin...

bounds

Get the lower and upper bounds for an empirical confidence interval

calc_balanced_precision

Calculate balanced precision given actual and baseline precision

calc_baseline_precision

Calculate the fraction of positives, i.e. baseline precision for a PRC...

calc_mean_perf

Generic function to calculate mean performance curves for multiple mod...

calc_perf_bootstrap_split

Calculate performance for a single split from rsample::bootstraps()

calc_perf_metrics

Get performance metrics for test data

calc_pvalue

Calculate the p-value for a permutation test

change_to_num

Change columns to numeric if possible

check_all

Check all params that don't return a value

check_cat_feats

Check if any features are categorical

check_corr_thresh

check that corr_thresh is either NULL or a number between 0 and 1

check_dataset

Check that the dataset is not empty and has more than 1 column.

check_features

Check features

check_group_partitions

Check the validity of the group_partitions list

check_groups

Check grouping vector

check_kfold

Check that kfold is an integer of reasonable size

check_method

Check if the method is supported. If not, throws error.

check_ntree

Check ntree

check_outcome_column

Check that outcome column exists. Pick outcome column if not specified...

check_outcome_value

Check that the outcome variable is valid. Pick outcome value if necess...

check_packages_installed

Check whether package(s) are installed

check_perf_metric_function

Check perf_metric_function is NULL or a function

check_perf_metric_name

Check perf_metric_name is NULL or a function

check_permute

Check that permute is a logical

check_remove_var

Check remove_var

check_seed

check that the seed is either NA or a number

check_training_frac

Check that the training fraction is between 0 and 1

check_training_indices

Check the validity of the training indices

cluster_corr_mat

Cluster a matrix of correlated features

collapse_correlated_features

Collapse correlated features

combine_hp_performance

Combine hyperparameter performance metrics for multiple train/test spl...

compare_models

Perform permutation tests to compare the performance metric across all...

create_grouped_data_partition

Split into train and test set while splitting by groups. When `group_p...

create_grouped_k_multifolds

Splitting into folds for cross-validation when using groups

define_cv

Define cross-validation scheme and training parameters

find_permuted_perf_metric

Get permuted performance metric difference for a single feature (or gr...

flatten_corr_mat

Flatten correlation matrix to pairs

get_binary_corr_mat

Identify correlated features as a binary matrix

get_caret_dummyvars_df

Get dummyvars dataframe (i.e. design matrix)

get_caret_processed_df

Get preprocessed dataframe for continuous variables

get_corr_feats

Identify correlated features

get_difference

Calculate the difference in the mean of the metric for two groups

get_feature_importance

Get feature importance using the permutation method

get_groups_from_clusters

Assign features to groups

get_hp_performance

Get hyperparameter performance metrics

get_hyperparams_from_df

Split hyperparameters dataframe into named lists for each parameter

get_hyperparams_list

Set hyperparameters based on ML method and dataset characteristics

get_outcome_type

Get outcome type.

get_partition_indices

Select indices to partition the data into training & testing sets.

get_perf_metric_fn

Get default performance metric function

get_perf_metric_name

Get default performance metric name

get_performance_tbl

Get model performance metrics as a one-row tibble

get_seeds_trainControl

Get seeds for caret::trainControl()

get_tuning_grid

Generate the tuning grid for tuning hyperparameters

group_correlated_features

Group correlated features

is_whole_number

Check whether a numeric vector contains whole numbers.

keep_groups_in_cv_partitions

Whether groups can be kept together in partitions during cross-validat...

mikropml-package

mikropml: User-Friendly R Package for Robust Machine Learning Pipeline...

mutate_all_types

Mutate all columns with utils::type.convert().`

pbtick

Update progress if the progress bar is not NULL.

permute_p_value

Calculated a permuted p-value comparing two models

plot_curves

Plot ROC and PRC curves

plot_hp_performance

Plot hyperparameter performance metrics

plot_model_performance

Plot performance metrics for multiple ML runs with different parameter...

preprocess_data

Preprocess data prior to running machine learning

process_cat_feats

Process categorical features

process_cont_feats

Preprocess continuous features

process_novar_feats

Process features with no variation

radix_sort

Call sort() with method = 'radix'

randomize_feature_order

Randomize feature order to eliminate any position-dependent effects

reexports

caret contr.ltfr

remove_singleton_columns

Remove columns appearing in only threshold row(s) or fewer.

replace_spaces

Replace spaces in all elements of a character vector with underscores

rm_missing_outcome

Remove missing outcome values

run_ml

Run the machine learning pipeline

select_apply

Use future apply if available

sensspec

Calculate and summarize performance for ROC and PRC plots

set_hparams_glmnet

Set hyperparameters for regression models for use with glmnet

set_hparams_rf

Set hyparameters for random forest models

set_hparams_rpart2

Set hyperparameters for decision tree models

set_hparams_svmRadial

Set hyperparameters for SVM with radial kernel

set_hparams_xgbTree

Set hyperparameters for SVM with radial kernel

shared_ggprotos

Get plot layers shared by plot_mean_roc and plot_mean_prc

shuffle_group

Shuffle the rows in a column

split_outcome_features

Split dataset into outcome and features

tidy_perf_data

Tidy the performance dataframe

train_model

Train model using caret::train().

An interface to build machine learning models for classification and regression problems. 'mikropml' implements the ML pipeline described by Topçuoğlu et al. (2020) <doi:10.1128/mBio.00434-20> with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.

  • Maintainer: Kelly Sovacool
  • License: MIT + file LICENSE
  • Last published: 2023-08-21