BenchmarkResult function

Container for Benchmarking Results

Container for Benchmarking Results

This is the result container object returned by benchmark(). A BenchmarkResult consists of the data of multiple ResampleResult s. The contents of a BenchmarkResult and ResampleResult are almost identical and the stored ResampleResult s can be extracted via the $resample_result(i) method, where i is the index of the performed resample experiment. This allows us to investigate the extracted ResampleResult and individual resampling iterations, as well as the predictions and models from each fold.

BenchmarkResult s can be visualized via list("mlr3viz")'s autoplot() function.

For statistical analysis of benchmark results and more advanced plots, see list("mlr3benchmark").

Note

All stored objects are accessed by reference. Do not modify any extracted object without cloning it first.

S3 Methods

  • as.data.table(rr, ..., reassemble_learners = TRUE, convert_predictions= TRUE, predict_sets = "test", task_characteristics = FALSE)

    BenchmarkResult -> data.table::data.table()

    Returns a tabular view of the internal data.

  • c(...)

    (BenchmarkResult , ...) -> BenchmarkResult

    Combines multiple objects convertible to BenchmarkResult into a new BenchmarkResult .

Examples

set.seed(123) learners = list( lrn("classif.featureless", predict_type = "prob"), lrn("classif.rpart", predict_type = "prob") ) design = benchmark_grid( tasks = list(tsk("sonar"), tsk("penguins")), learners = learners, resamplings = rsmp("cv", folds = 3) ) print(design) bmr = benchmark(design) print(bmr) bmr$tasks bmr$learners # first 5 resampling iterations head(as.data.table(bmr, measures = c("classif.acc", "classif.auc")), 5) # aggregate results bmr$aggregate() # aggregate results with hyperparameters as separate columns mlr3misc::unnest(bmr$aggregate(params = TRUE), "params") # extract resample result for classif.rpart rr = bmr$aggregate()[learner_id == "classif.rpart", resample_result][[1]] print(rr) # access the confusion matrix of the first resampling iteration rr$predictions()[[1]]$confusion # reduce to subset with task id "sonar" bmr$filter(task_ids = "sonar") print(bmr)

See Also

Other benchmark: benchmark(), benchmark_grid()

Active bindings

  • task_type: (character(1))

     Task type of objects in the `BenchmarkResult`. All stored objects (Task , Learner , Prediction ) in a single `BenchmarkResult` are required to have the same task type, e.g., `"classif"` or `"regr"`. This is `NA` for empty BenchmarkResult s.
    
  • tasks: (data.table::data.table())

     Table of included Task s with three columns:
     
      * `"task_hash"` (`character(1)`),
      * `"task_id"` (`character(1)`), and
      * `"task"` (Task ).
    
  • learners: (data.table::data.table())

     Table of included Learner s with three columns:
     
      * `"learner_hash"` (`character(1)`),
      * `"learner_id"` (`character(1)`), and
      * `"learner"` (Learner ).
     
     Note that it is not feasible to access learned models via this field, as the training task would be ambiguous. For this reason the returned learner are reset before they are returned. Instead, select a row from the table returned by `$score()`.
    
  • resamplings: (data.table::data.table())

     Table of included Resampling s with three columns:
     
      * `"resampling_hash"` (`character(1)`),
      * `"resampling_id"` (`character(1)`), and
      * `"resampling"` (Resampling ).
    
  • resample_results: (data.table::data.table())

     Returns a table with three columns:
     
      * `uhash` (`character()`).
      * `resample_result` (ResampleResult ).
    
  • n_resample_results: (integer(1))

     Returns the total number of stored ResampleResult s.
    
  • uhashes: (character())

     Set of (unique) hashes of all included ResampleResult s.
    

Methods

Public methods

Method new()

Creates a new instance of this R6 class.

Usage

BenchmarkResult$new(data = NULL)

Arguments

  • data: (ResultData)

     An object of type `ResultData`, either extracted from another ResampleResult , another BenchmarkResult , or manually constructed with `as_result_data()`.
    

Method help()

Opens the help page for this object.

Usage

BenchmarkResult$help()

Method format()

Helper for print outputs.

Usage

BenchmarkResult$format(...)

Arguments

  • ...: (ignored).

Method print()

Printer.

Usage

BenchmarkResult$print()

Method combine()

Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place. If the second BenchmarkResult bmr is NULL, simply returns self. Note that you can alternatively use the combine function c() which calls this method internally.

Usage

BenchmarkResult$combine(bmr)

Arguments

  • bmr: (BenchmarkResult )

     A second BenchmarkResult object.
    

Returns

Returns the object itself, but modified by reference . You need to explicitly $clone() the object beforehand if you want to keep the object in its previous state.

Method marshal()

Marshals all stored models.

Usage

BenchmarkResult$marshal(...)

Arguments

  • ...: (any)

     Additional arguments passed to `marshal_model()`.
    

Method unmarshal()

Unmarshals all stored models.

Usage

BenchmarkResult$unmarshal(...)

Arguments

  • ...: (any)

     Additional arguments passed to `unmarshal_model()`.
    

Method score()

Returns a table with one row for each resampling iteration, including all involved objects: Task , Learner , Resampling , iteration number (integer(1)), and Prediction . If ids is set to TRUE, character column of extracted ids are added to the table for convenient filtering: "task_id", "learner_id", and "resampling_id".

Additionally calculates the provided performance measures and binds the performance scores as extra columns. These columns are named using the id of the respective Measure .

Usage

BenchmarkResult$score(
  measures = NULL,
  ids = TRUE,
  conditions = FALSE,
  predictions = TRUE
)

Arguments

  • measures: (Measure | list of Measure )

     Measure(s) to calculate.
    
  • ids: (logical(1))

     Adds object ids (`"task_id"`, `"learner_id"`, `"resampling_id"`) as extra character columns to the returned table.
    
  • conditions: (logical(1))

     Adds condition messages (`"warnings"`, `"errors"`) as extra list columns of character vectors to the returned table
    
  • predictions: (logical(1))

     Additionally return prediction objects, one column for each `predict_set` of all learners combined. Columns are named `"prediction_train"`, `"prediction_test"` and `"prediction_internal_valid"`, if present.
    

Returns

data.table::data.table().

Method obs_loss()

Calculates the observation-wise loss via the loss function set in the Measure 's field obs_loss. Returns a data.table() with the columns row_ids, truth, response and one additional numeric column for each measure, named with the respective measure id. If there is no observation-wise loss function for the measure, the column is filled with NA values. Note that some measures such as RMSE, do have an $obs_loss, but they require an additional transformation after aggregation, in this example taking the square-root.

Usage

BenchmarkResult$obs_loss(measures = NULL, predict_sets = "test")

Arguments

  • measures: (Measure | list of Measure )

     Measure(s) to calculate.
    
  • predict_sets: (character())

     The predict sets.
    

Method aggregate()

Returns a result table where resampling iterations are combined into ResampleResult s. A column with the aggregated performance score is added for each Measure , named with the id of the respective measure.

The method for aggregation is controlled by the Measure , e.g. micro aggregation, macro aggregation or custom aggregation. Most measures default to macro aggregation.

Note that the aggregated performances just give a quick impression which approaches work well and which approaches are probably underperforming. However, the aggregates do not account for variance and cannot replace a statistical test. See list("mlr3viz") to get a better impression via boxplots or list("mlr3benchmark") for critical difference plots and significance tests.

For convenience, different flags can be set to extract more information from the returned ResampleResult .

Usage

BenchmarkResult$aggregate(
  measures = NULL,
  ids = TRUE,
  uhashes = FALSE,
  params = FALSE,
  conditions = FALSE
)

Arguments

  • measures: (Measure | list of Measure )

     Measure(s) to calculate.
    
  • ids: (logical(1))

     Adds object ids (`"task_id"`, `"learner_id"`, `"resampling_id"`) as extra character columns for convenient subsetting.
    
  • uhashes: (logical(1))

     Adds the uhash values of the ResampleResult as extra character column `"uhash"`.
    
  • params: (logical(1))

     Adds the hyperparameter values as extra list column `"params"`. You can unnest them with `mlr3misc::unnest()`.
    
  • conditions: (logical(1))

     Adds the number of resampling iterations with at least one warning as extra integer column `"warnings"`, and the number of resampling iterations with errors as extra integer column `"errors"`.
    

Returns

data.table::data.table().

Method filter()

Subsets the benchmark result. If task_ids is not NULL, keeps all tasks with provided task ids and discards all others tasks. Same procedure for learner_ids and resampling_ids.

Usage

BenchmarkResult$filter(
  task_ids = NULL,
  task_hashes = NULL,
  learner_ids = NULL,
  learner_hashes = NULL,
  resampling_ids = NULL,
  resampling_hashes = NULL
)

Arguments

  • task_ids: (character())

     Ids of Task s to keep.
    
  • task_hashes: (character())

     Hashes of Task s to keep.
    
  • learner_ids: (character())

     Ids of Learner s to keep.
    
  • learner_hashes: (character())

     Hashes of Learner s to keep.
    
  • resampling_ids: (character())

     Ids of Resampling s to keep.
    
  • resampling_hashes: (character())

     Hashes of Resampling s to keep.
    

Returns

Returns the object itself, but modified by reference . You need to explicitly $clone() the object beforehand if you want to keeps the object in its previous state.

Method resample_result()

Retrieve the i-th ResampleResult , by position or by unique hash uhash. i and uhash are mutually exclusive.

Usage

BenchmarkResult$resample_result(i = NULL, uhash = NULL)

Arguments

  • i: (integer(1))

     The iteration value to filter for.
    
  • uhash: (logical(1))

     The `ushash` value to filter for.
    

Returns

ResampleResult .

Method discard()

Shrinks the BenchmarkResult by discarding parts of the internally stored data. Note that certain operations might stop work, e.g. extracting importance values from learners or calculating measures requiring the task's data.

Usage

BenchmarkResult$discard(backends = FALSE, models = FALSE)

Arguments

  • backends: (logical(1))

     If `TRUE`, the DataBackend is removed from all stored Task s.
    
  • models: (logical(1))

     If `TRUE`, the stored model is removed from all Learner s.
    

Returns

Returns the object itself, but modified by reference . You need to explicitly $clone() the object beforehand if you want to keeps the object in its previous state.

Method clone()

The objects of this class are cloneable with this method.

Usage

BenchmarkResult$clone(deep = FALSE)

Arguments

  • deep: Whether to make a deep clone.