baseline_multinomial function

Create baseline evaluations

Create baseline evaluations

lifecycle::badge("maturing")

Create a baseline evaluation of a test set.

In modelling, a baseline is a result that is meaningful to compare the results from our models to. For instance, in classification, we usually want our results to be better than random guessing. E.g. if we have three classes, we can expect an accuracy of 33.33%, as for every observation we have 1/3 chance of guessing the correct class. So our model should achieve a higher accuracy than 33.33% before it is more useful to us than guessing.

While this expected value is often fairly straightforward to find analytically, it only represents what we can expect on average. In reality, it's possible to get far better results than that by guessing. ‘baseline_multinomial()’

finds the range of likely values by evaluating multiple sets of random predictions and summarizing them with a set of useful descriptors.

Technically, it creates one-vs-all (binomial) baseline evaluations for the n sets of random predictions and summarizes them. Additionally, sets of "all class x,y,z,..." predictions are evaluated.

baseline_multinomial( test_data, dependent_col, n = 100, metrics = list(), random_generator_fn = runif, parallel = FALSE )

Arguments

  • test_data: data.frame.

  • dependent_col: Name of dependent variable in the supplied test and training sets.

  • n: The number of sets of random predictions to evaluate. (Default is 100)

  • metrics: list for enabling/disabling metrics.

    E.g. list("F1" = FALSE) would remove F1 from the results, and list("Accuracy" = TRUE) would add the regular Accuracy metric to the results. Default values (TRUE/FALSE) will be used for the remaining available metrics.

    You can enable/disable all metrics at once by including "all" = TRUE/FALSE in the list. This is done prior to enabling/disabling individual metrics, why f.i. list("all" = FALSE, "Accuracy" = TRUE)

    would return only the Accuracy metric.

    The list can be created with multinomial_metrics().

    Also accepts the string "all".

  • random_generator_fn: Function for generating random numbers. The softmax function is applied to the generated numbers to transform them to probabilities.

    The first argument must be the number of random numbers to generate, as no other arguments are supplied.

    To test the effect of using different functions, see multiclass_probability_tibble().

  • parallel: Whether to run the n evaluations in parallel. (Logical)

    Remember to register a parallel backend first. E.g. with doParallel::registerDoParallel.

Returns

list containing:

  1. a tibble with summarized results (called summarized_metrics)
  2. a tibble with random evaluations (random_evaluations)
  3. a tibble with the summarized class level results (summarized_class_level_results)

....................................................................

Macro metrics

Based on the generated predictions, one-vs-all (binomial) evaluations are performed and aggregated to get the following macro metrics:

‘Balanced Accuracy’ , ‘F1’ , ‘Sensitivity’ , ‘Specificity’ , ‘Positive Predictive Value’ , ‘Negative Predictive Value’ , ‘Kappa’ , ‘Detection Rate’ , ‘Detection Prevalence’ , and ‘Prevalence’ .

In general, the metrics mentioned in binomial_metrics()

can be enabled as macro metrics (excluding MCC, AUC, Lower CI, Upper CI, and the AIC/AICc/BIC metrics). These metrics also has a weighted average version.

N.B. we also refer to the one-vs-all evaluations as the class level results.

Multiclass metrics

In addition, the ‘Overall Accuracy’ and multiclass

‘MCC’ metrics are computed. Multiclass AUC can be enabled but is slow to calculate with many classes.

....................................................................

The Summarized Results tibble contains:

Summary of the random evaluations.

How : The one-vs-all binomial evaluations are aggregated by repetition and summarized. Besides the metrics from the binomial evaluations, it also includes ‘Overall Accuracy’ and multiclass ‘MCC’ .

The Measure column indicates the statistical descriptor used on the evaluations. The Mean , Median , SD , IQR , Max , Min , NAs , and INFs measures describe the Random Evaluations tibble, while the CL_Max , CL_Min , CL_NAs , and CL_INFs describe the C lass L evel results.

The rows where Measure == All_<<class name>> are the evaluations when all the observations are predicted to be in that class.

....................................................................

The Summarized Class Level Results tibble contains:

The (nested) summarized results for each class, with the same metrics and descriptors as the Summarized Results tibble. Use tidyr::unnest

on the tibble to inspect the results.

How : The one-vs-all evaluations are summarized by class.

The rows where Measure == All_0 are the evaluations when none of the observations are predicted to be in that class, while the rows where Measure == All_1 are the evaluations when all of the observations are predicted to be in that class.

....................................................................

The Random Evaluations tibble contains:

The repetition results with the same metrics as the Summarized Results tibble.

How : The one-vs-all evaluations are aggregated by repetition. If a metric contains one or more NAs in the one-vs-all evaluations, it will lead to an NA result for that repetition.

Also includes:

A nested tibble with the one-vs-all binomial evaluations (Class Level Results ), including nested Confusion Matrices and the Support column, which is a count of how many observations from the class is in the test set.

A nested tibble with the predictions and targets.

A list of ROC curve objects.

A nested tibble with the multiclass confusion matrix .

A nested Process information object with information about the evaluation.

Name of dependent variable.

Details

Packages used:

Multiclass ROC curve and AUC: pROC::multiclass.roc

Examples

# Attach packages library(cvms) library(groupdata2) # partition() library(dplyr) # %>% arrange() library(tibble) # Data is part of cvms data <- participant.scores # Set seed for reproducibility set.seed(1) # Partition data partitions <- partition(data, p = 0.7, list_out = TRUE) train_set <- partitions[[1]] test_set <- partitions[[2]] # Note: usually n=100 is a good setting # Create some data with multiple classes multiclass_data <- tibble( "target" = rep(paste0("class_", 1:5), each = 10) ) %>% dplyr::sample_n(35) baseline_multinomial( test_data = multiclass_data, dependent_col = "target", n = 4 ) # Parallelize evaluations # Attach doParallel and register four cores # Uncomment: # library(doParallel) # registerDoParallel(4) # Make sure to uncomment the parallel argument (mb <- baseline_multinomial( test_data = multiclass_data, dependent_col = "target", n = 6 #, parallel = TRUE # Uncomment )) # Inspect the summarized class level results # for class_2 mb$summarized_class_level_results %>% dplyr::filter(Class == "class_2") %>% tidyr::unnest(Results) # Multinomial with custom random generator function # that creates very "certain" predictions # (once softmax is applied) rcertain <- function(n) { (runif(n, min = 1, max = 100)^1.4) / 100 } # Make sure to uncomment the parallel argument baseline_multinomial( test_data = multiclass_data, dependent_col = "target", n = 6, random_generator_fn = rcertain #, parallel = TRUE # Uncomment )

See Also

Other baseline functions: baseline(), baseline_binomial(), baseline_gaussian()

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

  • Maintainer: Ludvig Renbo Olsen
  • License: MIT + file LICENSE
  • Last published: 2025-03-07