threshold_perf function

Generate performance metrics across probability thresholds

Generate performance metrics across probability thresholds

threshold_perf() can take a set of class probability predictions and determine performance characteristics across different values of the probability threshold and any existing groups.

threshold_perf(.data, ...) ## S3 method for class 'data.frame' threshold_perf( .data, truth, estimate, thresholds = NULL, metrics = NULL, na_rm = TRUE, event_level = "first", ... )

Arguments

  • .data: A tibble, potentially grouped.
  • ...: Currently unused.
  • truth: The column identifier for the true two-class results (that is a factor). This should be an unquoted column name.
  • estimate: The column identifier for the predicted class probabilities (that is a numeric). This should be an unquoted column name.
  • thresholds: A numeric vector of values for the probability threshold. If unspecified, a series of values between 0.5 and 1.0 are used. Note : if this argument is used, it must be named.
  • metrics: Either NULL or a yardstick::metric_set() with a list of performance metrics to calculate. The metrics should all be oriented towards hard class predictions (e.g. yardstick::sensitivity(), yardstick::accuracy(), yardstick::recall(), etc.) and not class probabilities. A set of default metrics is used when NULL (see Details below).
  • na_rm: A single logical: should missing data be removed?
  • event_level: A single string. Either "first" or "second" to specify which level of truth to consider as the "event".

Returns

A tibble with columns: .threshold, .estimator, .metric, .estimate and any existing groups.

Details

Note that that the global option yardstick.event_first will be used to determine which level is the event of interest. For more details, see the Relevant level section of yardstick::sens().

The default calculated metrics are:

  • yardstick::j_index()
  • yardstick::sens()
  • yardstick::spec()
  • distance = (1 - sens) ^ 2 + (1 - spec) ^ 2

If a custom metric is passed that does not compute sensitivity and specificity, the distance metric is not computed.

Examples

library(dplyr) data("segment_logistic") # Set the threshold to 0.6 # > 0.6 = good # < 0.6 = poor threshold_perf(segment_logistic, Class, .pred_good, thresholds = 0.6) # Set the threshold to multiple values thresholds <- seq(0.5, 0.9, by = 0.1) segment_logistic %>% threshold_perf(Class, .pred_good, thresholds) # --------------------------------------------------------------------------- # It works with grouped data frames as well # Let's mock some resampled data resamples <- 5 mock_resamples <- resamples %>% replicate( expr = sample_n(segment_logistic, 100, replace = TRUE), simplify = FALSE ) %>% bind_rows(.id = "resample") resampled_threshold_perf <- mock_resamples %>% group_by(resample) %>% threshold_perf(Class, .pred_good, thresholds) resampled_threshold_perf # Average over the resamples resampled_threshold_perf %>% group_by(.metric, .threshold) %>% summarise(.estimate = mean(.estimate))