In modelling, a baseline is a result that is meaningful to compare the results from our models to. For instance, in classification, we usually want our results to be better than random guessing. E.g. if we have three classes, we can expect an accuracy of 33.33%, as for every observation we have 1/3 chance of guessing the correct class. So our model should achieve a higher accuracy than 33.33% before it is more useful to us than guessing.
While this expected value is often fairly straightforward to find analytically, it only represents what we can expect on average. In reality, it's possible to get far better results than that by guessing. ‘baseline()’ (binomial, multinomial) finds the range of likely values by evaluating multiple sets of random predictions and summarizing them with a set of useful descriptors. If random guessing frequently obtains an accuracy of 40%, perhaps our model should have better performance than this, before we declare it better than guessing.
How
When family is binomial: evaluates n sets of random predictions against the dependent variable, along with a set of all 0 predictions and a set of all 1 predictions. See also baseline_binomial().
When family is multinomial: creates one-vs-all (binomial) baseline evaluations for n sets of random predictions against the dependent variable, along with sets of "all class x,y,z,..." predictions. See also baseline_multinomial().
When family is gaussian: fits baseline models (y ~ 1) on n random subsets of train_data and evaluates each model on test_data. Also evaluates a model fitted on all rows in train_data. See also baseline_gaussian().
Wrapper functions
Consider using one of the wrappers, as they are simpler to use and understand: ‘baseline_gaussian()’ , ‘baseline_multinomial()’ , and ‘baseline_binomial()’ .
dependent_col: Name of dependent variable in the supplied test and training sets.
family: Name of family. (Character)
Currently supports "gaussian", "binomial" and "multinomial".
train_data: data.frame. Only used when family is "gaussian".
n: Number of random samplings to perform. (Default is 100)
For gaussian: The number of random samplings of train_data to fit baseline models on.
For binomial and multinomial: The number of sets of random predictions to evaluate.
metrics: list for enabling/disabling metrics.
E.g. list("RMSE" = FALSE) would remove RMSE from the regression results, and list("Accuracy" = TRUE) would add the regular Accuracy metric to the classification results. Default values (TRUE/FALSE) will be used for the remaining available metrics.
You can enable/disable all metrics at once by including "all" = TRUE/FALSE in the list. This is done prior to enabling/disabling individual metrics, why f.i. list("all" = FALSE, "RMSE" = TRUE)
would return only the RMSE metric.
The list can be created with gaussian_metrics(), binomial_metrics(), or multinomial_metrics().
Also accepts the string "all".
positive: Level from dependent variable to predict. Either as character (preferable) or level index (1 or 2 - alphabetically).
E.g. if we have the levels "cat" and "dog" and we want "dog" to be the positive class, we can either provide "dog" or 2, as alphabetically, "dog" comes after "cat".
Note: For reproducibility, it's preferable to specify the name directly , as different locales may sort the levels differently.
Used when calculating confusion matrix metrics and creating ROC curves.
N.B. Only affects evaluation metrics, not the returned predictions.
N.B. Binomial only . (Character or Integer)
cutoff: Threshold for predicted classes. (Numeric)
N.B. Binomial only
random_generator_fn: Function for generating random numbers when type is "multinomial". The softmax function is applied to the generated numbers to transform them to probabilities.
The first argument must be the number of random numbers to generate, as no other arguments are supplied.
To test the effect of using different functions, see multiclass_probability_tibble().
N.B. Multinomial only
random_effects: Random effects structure for the Gaussian baseline model. (Character)
E.g. with "(1|ID)", the model becomes "y ~ 1 + (1|ID)".
N.B. Gaussian only
min_training_rows: Minimum number of rows in the random subsets of train_data.
Gaussian only . (Integer)
min_training_rows_left_out: Minimum number of rows left out of the random subsets of train_data.
See the additional metrics (disabled by default) at ?gaussian_metrics.
The Measure column indicates the statistical descriptor used on the evaluations. The row where Measure == All_rows is the evaluation when the baseline model is trained on all rows in train_data.
The Training Rows column contains the aggregated number of rows used from train_data, when fitting the baseline models.
The Measure column indicates the statistical descriptor used on the evaluations. The row where Measure == All_0 is the evaluation when all predictions are 0. The row where Measure == All_1 is the evaluation when all predictions are 1.
A nested tibble with the confusion matrix . The Pos_ columns tells you whether a row is a True Positive (TP), True Negative (TN), False Positive (FP), or False Negative (FN), depending on which level is the "positive" class. I.e. the level you wish to predict.
A nested Process information object with information about the evaluation.
Name of dependent variable.
Multinomial Results
Based on the generated test set predictions, one-vs-all (binomial) evaluations are performed and aggregated to get the same metrics as in the binomial results (excluding MCC, AUC, Lower CI and Upper CI), with the addition of Overall Accuracy and multiclass
MCC in the summarized results. It is possible to enable multiclass AUC as well, which has been disabled by default as it is slow to calculate when there's a large set of classes.
Since we use macro-averaging, ‘Balanced Accuracy’ is the macro-averaged metric, not the macro sensitivity as sometimes used.
Note: we also refer to the one-vs-all evaluations as the class level results.
How : First, the one-vs-all binomial evaluations are aggregated by repetition, then, these aggregations are summarized. Besides the metrics from the binomial evaluations (see Binomial Results above), it also includes ‘Overall Accuracy’ and multiclass ‘MCC’ .
The Measure column indicates the statistical descriptor used on the evaluations. The Mean , Median , SD , IQR , Max , Min , NAs , and INFs measures describe the Random Evaluationstibble, while the CL_Max , CL_Min , CL_NAs , and CL_INFs describe the C lass L evel results.
The rows where Measure == All_<<class name>> are the evaluations when all the observations are predicted to be in that class.
The Summarized Class Level Results tibble contains:
The (nested) summarized results for each class, with the same metrics and descriptors as the Summarized Resultstibble. Use tidyr::unnest
on the tibble to inspect the results.
How : The one-vs-all evaluations are summarized by class.
The rows where Measure == All_0 are the evaluations when none of the observations are predicted to be in that class, while the rows where Measure == All_1 are the evaluations when all of the observations are predicted to be in that class.
The repetition results with the same metrics as the Summarized Resultstibble.
How : The one-vs-all evaluations are aggregated by repetition. If a metric contains one or more NAs in the one-vs-all evaluations, it will lead to an NA result for that repetition.
Also includes:
A nested tibble with the one-vs-all binomial evaluations (Class Level Results ), including nested Confusion Matrices and the Support column, which is a count of how many observations from the class is in the test set.
A nested tibble with the predictions and targets.
A list of ROC curve objects.
A nested tibble with the multiclass confusion matrix .
A nested Process information object with information about the evaluation.
Name of dependent variable.
Details
Packages used:
Models
Gaussian: stats::lm, lme4::lmer
Results
Gaussian :
r2m : MuMIn::r.squaredGLMM
r2c : MuMIn::r.squaredGLMM
AIC : stats::AIC
AICc : MuMIn::AICc
BIC : stats::BIC
Binomial and Multinomial :
ROC and related metrics:
Binomial: pROC::roc
Multinomial: pROC::multiclass.roc
Examples
# Attach packageslibrary(cvms)library(groupdata2)# partition()library(dplyr)# %>% arrange()library(tibble)# Data is part of cvmsdata <- participant.scores
# Set seed for reproducibilityset.seed(1)# Partition datapartitions <- partition(data, p =0.7, list_out =TRUE)train_set <- partitions[[1]]test_set <- partitions[[2]]# Note: usually n=100 is a good setting# Gaussianbaseline( test_data = test_set, train_data = train_set, dependent_col ="score", random_effects ="(1|session)", n =2, family ="gaussian")# Binomialbaseline( test_data = test_set, dependent_col ="diagnosis", n =2, family ="binomial")# Multinomial# Create some data with multiple classesmulticlass_data <- tibble("target"= rep(paste0("class_",1:5), each =10))%>% dplyr::sample_n(35)baseline( test_data = multiclass_data, dependent_col ="target", n =4, family ="multinomial")# Parallelize evaluations# Attach doParallel and register four cores# Uncomment:# library(doParallel)# registerDoParallel(4)# Binomialbaseline( test_data = test_set, dependent_col ="diagnosis", n =4, family ="binomial"#, parallel = TRUE # Uncomment)# Gaussianbaseline( test_data = test_set, train_data = train_set, dependent_col ="score", random_effects ="(1|session)", n =4, family ="gaussian"#, parallel = TRUE # Uncomment)# Multinomial(mb <- baseline( test_data = multiclass_data, dependent_col ="target", n =6, family ="multinomial"#, parallel = TRUE # Uncomment))# Inspect the summarized class level results# for class_2mb$summarized_class_level_results %>% dplyr::filter(Class =="class_2")%>% tidyr::unnest(Results)# Multinomial with custom random generator function# that creates very "certain" predictions# (once softmax is applied)rcertain <-function(n){(runif(n, min =1, max =100)^1.4)/100}baseline( test_data = multiclass_data, dependent_col ="target", n =6, family ="multinomial", random_generator_fn = rcertain
#, parallel = TRUE # Uncomment)
See Also
Other baseline functions: baseline_binomial(), baseline_gaussian(), baseline_multinomial()