mlr3resampling R package [Documentation]

Learners

Learner classes with special methods

proj_compute

Compute resampling results in a project

proj_grid

Initialize a new project grid table

proj_results

Combine and save results in a project

proj_submit

Compute several resampling jobs

proj_test

Test a project with smaller data and fewer resampling iterations

pvalue

P-values for comparing Same/Other/All training

ResamplingSameOtherCV

Resampling for comparing training on same or other subsets

ResamplingSameOtherSizesCV

Resampling for comparing train subsets and sizes

ResamplingVariableSizeTrainCV

Resampling for comparing training on same or other groups

score

Score benchmark results

Download source package Read PDF manual

A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. If each data point belongs to a subset (such as geographic region, year, etc), then how do we know if subsets are similar enough so that we can get accurate predictions on one subset, after training on Other subsets? And how do we know if training on All subsets would improve prediction accuracy, relative to training on the Same subset? SOAK, Same/Other/All K-fold cross-validation, <doi:10.48550/arXiv.2410.08643> can be used to answer these questions, by fixing a test subset, training models on Same/Other/All subsets, and then comparing test error rates (Same versus Other and Same versus All). Also provides code for estimating how many train samples are required to get accurate predictions on a test set.

Maintainer: Toby Hocking
License: LGPL-3
Last published: 2025-11-20

Useful links

mlr3resampling2025.11.19 package

Functions

Datasets

Imports

Versions