In modelling, a baseline is a result that is meaningful to compare the results from our models to. For instance, in classification, we usually want our results to be better than random guessing. E.g. if we have three classes, we can expect an accuracy of 33.33%, as for every observation we have 1/3 chance of guessing the correct class. So our model should achieve a higher accuracy than 33.33% before it is more useful to us than guessing.
While this expected value is often fairly straightforward to find analytically, it only represents what we can expect on average. In reality, it's possible to get far better results than that by guessing. ‘baseline_multinomial()’
finds the range of likely values by evaluating multiple sets of random predictions and summarizing them with a set of useful descriptors.
Technically, it creates one-vs-all (binomial) baseline evaluations for the n sets of random predictions and summarizes them. Additionally, sets of "all class x,y,z,..." predictions are evaluated.
dependent_col: Name of dependent variable in the supplied test and training sets.
n: The number of sets of random predictions to evaluate. (Default is 100)
metrics: list for enabling/disabling metrics.
E.g. list("F1" = FALSE) would remove F1 from the results, and list("Accuracy" = TRUE) would add the regular Accuracy metric to the results. Default values (TRUE/FALSE) will be used for the remaining available metrics.
You can enable/disable all metrics at once by including "all" = TRUE/FALSE in the list. This is done prior to enabling/disabling individual metrics, why f.i. list("all" = FALSE, "Accuracy" = TRUE)
would return only the Accuracy metric.
The list can be created with multinomial_metrics().
Also accepts the string "all".
random_generator_fn: Function for generating random numbers. The softmax function is applied to the generated numbers to transform them to probabilities.
The first argument must be the number of random numbers to generate, as no other arguments are supplied.
To test the effect of using different functions, see multiclass_probability_tibble().
parallel: Whether to run the n evaluations in parallel. (Logical)
Remember to register a parallel backend first. E.g. with doParallel::registerDoParallel.
Returns
list containing:
a tibble with summarized results (called summarized_metrics)
a tibble with random evaluations (random_evaluations)
a tibble with the summarized class level results (summarized_class_level_results)
In general, the metrics mentioned in binomial_metrics()
can be enabled as macro metrics (excluding MCC, AUC, Lower CI, Upper CI, and the AIC/AICc/BIC metrics). These metrics also has a weighted average version.
N.B. we also refer to the one-vs-all evaluations as the class level results.
Multiclass metrics
In addition, the ‘Overall Accuracy’ and multiclass
‘MCC’ metrics are computed. MulticlassAUC can be enabled but is slow to calculate with many classes.
How : The one-vs-all binomial evaluations are aggregated by repetition and summarized. Besides the
metrics from the binomial evaluations, it
also includes ‘Overall Accuracy’ and multiclass ‘MCC’ .
The Measure column indicates the statistical descriptor used on the evaluations. The Mean , Median , SD , IQR , Max , Min , NAs , and INFs measures describe the Random Evaluationstibble, while the CL_Max , CL_Min , CL_NAs , and CL_INFs describe the C lass L evel results.
The rows where Measure == All_<<class name>> are the evaluations when all the observations are predicted to be in that class.
The Summarized Class Level Results tibble contains:
The (nested) summarized results for each class, with the same metrics and descriptors as the Summarized Resultstibble. Use tidyr::unnest
on the tibble to inspect the results.
How : The one-vs-all evaluations are summarized by class.
The rows where Measure == All_0 are the evaluations when none of the observations are predicted to be in that class, while the rows where Measure == All_1 are the evaluations when all of the observations are predicted to be in that class.
The repetition results with the same metrics as the Summarized Resultstibble.
How : The one-vs-all evaluations are aggregated by repetition. If a metric contains one or more NAs in the one-vs-all evaluations, it will lead to an NA result for that repetition.
Also includes:
A nested tibble with the one-vs-all binomial evaluations (Class Level Results ), including nested Confusion Matrices and the Support column, which is a count of how many observations from the class is in the test set.
A nested tibble with the predictions and targets.
A list of ROC curve objects.
A nested tibble with the multiclass confusion matrix .
A nested Process information object with information about the evaluation.
Name of dependent variable.
Details
Packages used:
Multiclass ROC curve and AUC: pROC::multiclass.roc
Examples
# Attach packageslibrary(cvms)library(groupdata2)# partition()library(dplyr)# %>% arrange()library(tibble)# Data is part of cvmsdata <- participant.scores
# Set seed for reproducibilityset.seed(1)# Partition datapartitions <- partition(data, p =0.7, list_out =TRUE)train_set <- partitions[[1]]test_set <- partitions[[2]]# Note: usually n=100 is a good setting# Create some data with multiple classesmulticlass_data <- tibble("target"= rep(paste0("class_",1:5), each =10))%>% dplyr::sample_n(35)baseline_multinomial( test_data = multiclass_data, dependent_col ="target", n =4)# Parallelize evaluations# Attach doParallel and register four cores# Uncomment:# library(doParallel)# registerDoParallel(4)# Make sure to uncomment the parallel argument(mb <- baseline_multinomial( test_data = multiclass_data, dependent_col ="target", n =6#, parallel = TRUE # Uncomment))# Inspect the summarized class level results# for class_2mb$summarized_class_level_results %>% dplyr::filter(Class =="class_2")%>% tidyr::unnest(Results)# Multinomial with custom random generator function# that creates very "certain" predictions# (once softmax is applied)rcertain <-function(n){(runif(n, min =1, max =100)^1.4)/100}# Make sure to uncomment the parallel argumentbaseline_multinomial( test_data = multiclass_data, dependent_col ="target", n =6, random_generator_fn = rcertain
#, parallel = TRUE # Uncomment)
See Also
Other baseline functions: baseline(), baseline_binomial(), baseline_gaussian()