Measure function

Measure Class

Measure Class

This is the abstract base class for measures like MeasureClassif and MeasureRegr .

Measures are classes tailored around two functions doing the work:

  1. A function $score() which quantifies the performance by comparing the truth and predictions.
  2. A function $aggregator() which combines multiple performance scores returned by $score() to a single numeric value.

In addition to these two functions, meta-information about the performance measure is stored.

Predefined measures are stored in the dictionary mlr_measures , e.g. classif.auc or time_train. Many of the measures in mlr3 are implemented in list("mlr3measures") as ordinary functions.

A guide on how to extend list("mlr3") with custom measures can be found in the mlr3book.

Inheriting

For some measures (such as confidence intervals from mlr3inferr) it is necessary that a measure returns more than one value. In such cases it is necessary to overwrite the public methods $aggregate() and/or $score() to return a named numeric()

where at least one of its names corresponds to the id of the measure itself.

See Also

Other Measure: MeasureClassif, MeasureRegr, MeasureSimilarity, mlr_measures, mlr_measures_aic, mlr_measures_bic, mlr_measures_classif.costs, mlr_measures_debug_classif, mlr_measures_elapsed_time, mlr_measures_internal_valid_score, mlr_measures_oob_error, mlr_measures_regr.rsq, mlr_measures_selected_features

Public fields

  • id: (character(1))

     Identifier of the object. Used in tables, plot and text output.
    
  • label: (character(1))

     Label for this object. Can be used in tables, plot and text output instead of the ID.
    
  • task_type: (character(1))

     Task type, e.g. `"classif"` or `"regr"`.
     
     For a complete list of possible task types (depending on the loaded packages), see `mlr_reflections$task_types$type`.
    
  • param_set: (paradox::ParamSet )

     Set of hyperparameters.
    
  • obs_loss: (function() | NULL) Function to calculate the observation-wise loss.

  • trafo: (list() | NULL) NULL or a list with two elements:

      * `trafo`: the transformation function applied after aggregating observation-wise losses (e.g. `sqrt` for RMSE)
      * `deriv`: The derivative of the `trafo`.
    
  • predict_type: (character(1))

     Required predict type of the Learner .
    
  • check_prerequisites: (character(1))

     How to proceed if one of the following prerequisites is not met:
     
      * wrong predict type (e.g., probabilities required, but only labels available).
      * wrong predict set (e.g., learner predicted on training set, but predictions of test set required).
      * task properties not satisfied (e.g., binary classification measure on multiclass task).
     
     Possible values are `"ignore"` (just return `NaN`) and `"warn"` (default, raise a warning before returning `NaN`).
    
  • task_properties: (character())

     Required properties of the Task .
    
  • range: (numeric(2))

     Lower and upper bound of possible performance scores.
    
  • properties: (character())

     Properties of this measure.
    
  • minimize: (logical(1))

     If `TRUE`, good predictions correspond to small values of performance scores.
    
  • packages: (character(1))

     Set of required packages. These packages are loaded, but not attached.
    
  • man: (character(1))

     String in the format `[pkg]::[topic]` pointing to a manual page for this object. Defaults to `NA`, but can be set by child classes.
    

Active bindings

  • predict_sets: (character())

     During `resample()`/`benchmark()`, a Learner can predict on multiple sets. Per default, a learner only predicts observations in the test set (`predict_sets == "test"`). To change this behavior, set `predict_sets` to a non-empty subset of `{"train", "test", "internal_valid"}`. The `"train"` predict set contains the train ids from the resampling. This means that if a learner does validation and sets `$validate` to a ratio (creating the validation data from the training data), the train predictions will include the predictions for the validation data. Each set yields a separate Prediction object. Those can be combined via getters in ResampleResult /BenchmarkResult , or Measure s can be configured to operate on specific subsets of the calculated prediction sets.
    
  • hash: (character(1))

     Hash (unique identifier) for this object. The hash is calculated based on the id, the parameter settings, predict sets and the `$score`, `$average`, `$aggregator`, `$obs_loss`, `$trafo` method. Measure can define additional fields to be included in the hash by setting the field `$.extra_hash`.
    
  • average: (character(1))

     Method for aggregation:
     
      * `"micro"`: All predictions from multiple resampling iterations are first combined into a single Prediction object. Next, the scoring function of the measure is applied on this combined object, yielding a single numeric score.
      * `"macro"`: The scoring function is applied on the Prediction object of each resampling iterations, each yielding a single numeric score. Next, the scores are combined with the `aggregator` function to a single numerical score.
      * `"custom"`: The measure comes with a custom aggregation method which directly operates on a ResampleResult .
    
  • aggregator: (function())

     Function to aggregate scores computed on different resampling iterations.
    

Methods

Public methods

Method new()

Creates a new instance of this R6 class.

Note that this object is typically constructed via a derived classes, e.g. MeasureClassif or MeasureRegr .

Usage

Measure$new(
  id,
  task_type = NA,
  param_set = ps(),
  range = c(-Inf, Inf),
  minimize = NA,
  average = "macro",
  aggregator = NULL,
  obs_loss = NULL,
  properties = character(),
  predict_type = "response",
  predict_sets = "test",
  task_properties = character(),
  packages = character(),
  label = NA_character_,
  man = NA_character_,
  trafo = NULL
)

Arguments

  • id: (character(1))

     Identifier for the new instance.
    
  • task_type: (character(1))

     Type of task, e.g. `"regr"` or `"classif"`. Must be an element of mlr_reflections$task_types$type .
    
  • param_set: (paradox::ParamSet )

     Set of hyperparameters.
    
  • range: (numeric(2))

     Feasible range for this measure as `c(lower_bound, upper_bound)`. Both bounds may be infinite.
    
  • minimize: (logical(1))

     Set to `TRUE` if good predictions correspond to small values, and to `FALSE` if good predictions correspond to large values. If set to `NA` (default), tuning this measure is not possible.
    
  • average: (character(1))

     How to average multiple Prediction s from a ResampleResult .
     
     The default, `"macro"`, calculates the individual performances scores for each Prediction and then uses the function defined in `$aggregator` to average them to a single number.
     
     If set to `"micro"`, the individual Prediction objects are first combined into a single new Prediction object which is then used to assess the performance. The function in `$aggregator` is not used in this case.
    
  • aggregator: (function())

     Function to aggregate over multiple iterations. The role of this function depends on the value of field `"average"`:
     
      * `"macro"`: A numeric vector of scores (one per iteration) is passed. The aggregate function defaults to `mean()` in this case.
      * `"micro"`: The `aggregator` function is not used. Instead, predictions from multiple iterations are first combined and then scored in one go.
      * `"custom"`: A ResampleResult is passed to the aggregate function.
    
  • obs_loss: (function or NULL)

     The observation-wise loss function, e.g. zero-one for classification error.
    
  • properties: (character())

     Properties of the measure. Must be a subset of mlr_reflections$measure_properties . Supported by `mlr3`:
     
      * `"requires_task"` (requires the complete Task ),
      * `"requires_learner"` (requires the trained Learner ),
      * `"requires_model"` (requires the trained Learner , including the fitted model),
      * `"requires_train_set"` (requires the training indices from the Resampling ), and
      * `"na_score"` (the measure is expected to occasionally return `NA` or `NaN`).
      * `"primary_iters"` (the measure explictly handles resamplings that only use a subset of their iterations for the point estimate).
      * `"requires_no_prediction"` (No prediction is required; This usually means that the measure extracts some information from the learner state.).
    
  • predict_type: (character(1))

     Required predict type of the Learner . Possible values are stored in mlr_reflections$learner_predict_types .
    
  • predict_sets: (character())

     Prediction sets to operate on, used in `aggregate()` to extract the matching `predict_sets` from the ResampleResult . Multiple predict sets are calculated by the respective Learner during `resample()`/`benchmark()`. Must be a non-empty subset of `{"train", "test", "internal_valid"}`. If multiple sets are provided, these are first combined to a single prediction object. Default is `"test"`.
    
  • task_properties: (character())

     Required task properties, see Task .
    
  • packages: (character())

     Set of required packages. A warning is signaled by the constructor if at least one of the packages is not installed, but loaded (not attached) later on-demand via `requireNamespace()`.
    
  • label: (character(1))

     Label for the new instance.
    
  • man: (character(1))

     String in the format `[pkg]::[topic]` pointing to a manual page for this object. The referenced help package can be opened via method `$help()`.
    
  • trafo: (list() or NULL)

     An optional list with two elements, containing the transformation `"fn"` and its derivative `"deriv"`. The transformation function is the function that is applied after aggregating the pointwise losses, i.e. this requires an `$obs_loss` to be present. An example is `sqrt` for RMSE.
    

Method format()

Helper for print outputs.

Usage

Measure$format(...)

Arguments

  • ...: (ignored).

Method print()

Printer.

Usage

Measure$print(...)

Arguments

  • ...: (ignored).

Method help()

Opens the corresponding help page referenced by field $man.

Usage

Measure$help()

Method score()

Takes a Prediction (or a list of Prediction objects named with valid predict_sets) and calculates a numeric score. If the measure if flagged with the properties "requires_task", "requires_learner", "requires_model" or "requires_train_set", you must additionally pass the respective Task , the (trained) Learner or the training set indices. This is handled internally during resample()/benchmark().

Usage

Measure$score(prediction, task = NULL, learner = NULL, train_set = NULL)

Arguments

  • prediction: (Prediction | named list of Prediction ).

  • task: (Task ).

  • learner: (Learner ).

  • train_set: (integer()).

Returns

numeric(1).

Method aggregate()

Aggregates multiple performance scores into a single score, e.g. by using the aggregator

function of the measure.

Usage

Measure$aggregate(rr)

Arguments

  • rr: ResampleResult .

Returns

numeric(1).

Method clone()

The objects of this class are cloneable with this method.

Usage

Measure$clone(deep = FALSE)

Arguments

  • deep: Whether to make a deep clone.