Measure Class
This is the abstract base class for measures like MeasureClassif and MeasureRegr .
Measures are classes tailored around two functions doing the work:
$score()
which quantifies the performance by comparing the truth and predictions.$aggregator()
which combines multiple performance scores returned by $score()
to a single numeric value.In addition to these two functions, meta-information about the performance measure is stored.
Predefined measures are stored in the dictionary mlr_measures , e.g. classif.auc
or time_train
. Many of the measures in mlr3
are implemented in list("mlr3measures") as ordinary functions.
A guide on how to extend list("mlr3") with custom measures can be found in the mlr3book.
For some measures (such as confidence intervals from mlr3inferr
) it is necessary that a measure returns more than one value. In such cases it is necessary to overwrite the public methods $aggregate()
and/or $score()
to return a named numeric()
where at least one of its names corresponds to the id
of the measure itself.
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-eval
Package list("mlr3measures") for the scoring functions. Dictionary of Measures : mlr_measures
as.data.table(mlr_measures)
for a table of available Measures in the running session (depending on the loaded packages).
Extension packages for additional task types:
Other Measure: MeasureClassif
, MeasureRegr
, MeasureSimilarity
, mlr_measures
, mlr_measures_aic
, mlr_measures_bic
, mlr_measures_classif.costs
, mlr_measures_debug_classif
, mlr_measures_elapsed_time
, mlr_measures_internal_valid_score
, mlr_measures_oob_error
, mlr_measures_regr.rsq
, mlr_measures_selected_features
id
: (character(1)
)
Identifier of the object. Used in tables, plot and text output.
label
: (character(1)
)
Label for this object. Can be used in tables, plot and text output instead of the ID.
task_type
: (character(1)
)
Task type, e.g. `"classif"` or `"regr"`.
For a complete list of possible task types (depending on the loaded packages), see `mlr_reflections$task_types$type`.
param_set
: (paradox::ParamSet )
Set of hyperparameters.
obs_loss
: (function()
| NULL
) Function to calculate the observation-wise loss.
trafo
: (list()
| NULL
) NULL
or a list with two elements:
* `trafo`: the transformation function applied after aggregating observation-wise losses (e.g. `sqrt` for RMSE)
* `deriv`: The derivative of the `trafo`.
predict_type
: (character(1)
)
Required predict type of the Learner .
check_prerequisites
: (character(1)
)
How to proceed if one of the following prerequisites is not met:
* wrong predict type (e.g., probabilities required, but only labels available).
* wrong predict set (e.g., learner predicted on training set, but predictions of test set required).
* task properties not satisfied (e.g., binary classification measure on multiclass task).
Possible values are `"ignore"` (just return `NaN`) and `"warn"` (default, raise a warning before returning `NaN`).
task_properties
: (character()
)
Required properties of the Task .
range
: (numeric(2)
)
Lower and upper bound of possible performance scores.
properties
: (character()
)
Properties of this measure.
minimize
: (logical(1)
)
If `TRUE`, good predictions correspond to small values of performance scores.
packages
: (character(1)
)
Set of required packages. These packages are loaded, but not attached.
man
: (character(1)
)
String in the format `[pkg]::[topic]` pointing to a manual page for this object. Defaults to `NA`, but can be set by child classes.
predict_sets
: (character()
)
During `resample()`/`benchmark()`, a Learner can predict on multiple sets. Per default, a learner only predicts observations in the test set (`predict_sets == "test"`). To change this behavior, set `predict_sets` to a non-empty subset of `{"train", "test", "internal_valid"}`. The `"train"` predict set contains the train ids from the resampling. This means that if a learner does validation and sets `$validate` to a ratio (creating the validation data from the training data), the train predictions will include the predictions for the validation data. Each set yields a separate Prediction object. Those can be combined via getters in ResampleResult /BenchmarkResult , or Measure s can be configured to operate on specific subsets of the calculated prediction sets.
hash
: (character(1)
)
Hash (unique identifier) for this object. The hash is calculated based on the id, the parameter settings, predict sets and the `$score`, `$average`, `$aggregator`, `$obs_loss`, `$trafo` method. Measure can define additional fields to be included in the hash by setting the field `$.extra_hash`.
average
: (character(1)
)
Method for aggregation:
* `"micro"`: All predictions from multiple resampling iterations are first combined into a single Prediction object. Next, the scoring function of the measure is applied on this combined object, yielding a single numeric score.
* `"macro"`: The scoring function is applied on the Prediction object of each resampling iterations, each yielding a single numeric score. Next, the scores are combined with the `aggregator` function to a single numerical score.
* `"custom"`: The measure comes with a custom aggregation method which directly operates on a ResampleResult .
aggregator
: (function()
)
Function to aggregate scores computed on different resampling iterations.
new()
Creates a new instance of this R6 class.
Note that this object is typically constructed via a derived classes, e.g. MeasureClassif or MeasureRegr .
Measure$new(
id,
task_type = NA,
param_set = ps(),
range = c(-Inf, Inf),
minimize = NA,
average = "macro",
aggregator = NULL,
obs_loss = NULL,
properties = character(),
predict_type = "response",
predict_sets = "test",
task_properties = character(),
packages = character(),
label = NA_character_,
man = NA_character_,
trafo = NULL
)
id
: (character(1)
)
Identifier for the new instance.
task_type
: (character(1)
)
Type of task, e.g. `"regr"` or `"classif"`. Must be an element of mlr_reflections$task_types$type .
param_set
: (paradox::ParamSet )
Set of hyperparameters.
range
: (numeric(2)
)
Feasible range for this measure as `c(lower_bound, upper_bound)`. Both bounds may be infinite.
minimize
: (logical(1)
)
Set to `TRUE` if good predictions correspond to small values, and to `FALSE` if good predictions correspond to large values. If set to `NA` (default), tuning this measure is not possible.
average
: (character(1)
)
How to average multiple Prediction s from a ResampleResult .
The default, `"macro"`, calculates the individual performances scores for each Prediction and then uses the function defined in `$aggregator` to average them to a single number.
If set to `"micro"`, the individual Prediction objects are first combined into a single new Prediction object which is then used to assess the performance. The function in `$aggregator` is not used in this case.
aggregator
: (function()
)
Function to aggregate over multiple iterations. The role of this function depends on the value of field `"average"`:
* `"macro"`: A numeric vector of scores (one per iteration) is passed. The aggregate function defaults to `mean()` in this case.
* `"micro"`: The `aggregator` function is not used. Instead, predictions from multiple iterations are first combined and then scored in one go.
* `"custom"`: A ResampleResult is passed to the aggregate function.
obs_loss
: (function
or NULL
)
The observation-wise loss function, e.g. zero-one for classification error.
properties
: (character()
)
Properties of the measure. Must be a subset of mlr_reflections$measure_properties . Supported by `mlr3`:
* `"requires_task"` (requires the complete Task ),
* `"requires_learner"` (requires the trained Learner ),
* `"requires_model"` (requires the trained Learner , including the fitted model),
* `"requires_train_set"` (requires the training indices from the Resampling ), and
* `"na_score"` (the measure is expected to occasionally return `NA` or `NaN`).
* `"primary_iters"` (the measure explictly handles resamplings that only use a subset of their iterations for the point estimate).
* `"requires_no_prediction"` (No prediction is required; This usually means that the measure extracts some information from the learner state.).
predict_type
: (character(1)
)
Required predict type of the Learner . Possible values are stored in mlr_reflections$learner_predict_types .
predict_sets
: (character()
)
Prediction sets to operate on, used in `aggregate()` to extract the matching `predict_sets` from the ResampleResult . Multiple predict sets are calculated by the respective Learner during `resample()`/`benchmark()`. Must be a non-empty subset of `{"train", "test", "internal_valid"}`. If multiple sets are provided, these are first combined to a single prediction object. Default is `"test"`.
task_properties
: (character()
)
Required task properties, see Task .
packages
: (character()
)
Set of required packages. A warning is signaled by the constructor if at least one of the packages is not installed, but loaded (not attached) later on-demand via `requireNamespace()`.
label
: (character(1)
)
Label for the new instance.
man
: (character(1)
)
String in the format `[pkg]::[topic]` pointing to a manual page for this object. The referenced help package can be opened via method `$help()`.
trafo
: (list()
or NULL
)
An optional list with two elements, containing the transformation `"fn"` and its derivative `"deriv"`. The transformation function is the function that is applied after aggregating the pointwise losses, i.e. this requires an `$obs_loss` to be present. An example is `sqrt` for RMSE.
format()
Helper for print outputs.
Measure$format(...)
...
: (ignored).
print()
Printer.
Measure$print(...)
...
: (ignored).
help()
Opens the corresponding help page referenced by field $man
.
Measure$help()
score()
Takes a Prediction (or a list of Prediction objects named with valid predict_sets
) and calculates a numeric score. If the measure if flagged with the properties "requires_task"
, "requires_learner"
, "requires_model"
or "requires_train_set"
, you must additionally pass the respective Task , the (trained) Learner or the training set indices. This is handled internally during resample()
/benchmark()
.
Measure$score(prediction, task = NULL, learner = NULL, train_set = NULL)
prediction
: (Prediction | named list of Prediction ).
task
: (Task ).
learner
: (Learner ).
train_set
: (integer()
).
numeric(1)
.
aggregate()
Aggregates multiple performance scores into a single score, e.g. by using the aggregator
function of the measure.
Measure$aggregate(rr)
rr
: ResampleResult .
numeric(1)
.
clone()
The objects of this class are cloneable with this method.
Measure$clone(deep = FALSE)
deep
: Whether to make a deep clone.
Useful links