This is a helper function that generates a default surrogate, based on properties of the objective function and the selected infill criterion.
For numeric-only (including integers) parameter spaces without any dependencies:
A Kriging model regr.km with kernel matern3_2 is created.
If the objective function is deterministic we add a small nugget effect (10^-8*Var(y), y is vector of observed outcomes in current design) to increase numerical stability to hopefully prevent crashes of DiceKriging.
If the objective function is noisy the nugget effect will be estimated with nugget.estim = TRUE (but you can override this in ....
Also jitter is set to TRUE to circumvent a problem with DiceKriging where already trained input values produce the exact trained output. For further information check the $note slot of the created learner.
Instead of the default "BFGS" optimization method we use rgenoud ("gen"), which is a hybrid algorithm, to combine global search based on genetic algorithms and local search based on gradients. This may improve the model fit and will less frequently produce a constant surrogate model. You can also override this setting in ....
For mixed numeric-categorical parameter spaces, or spaces with conditional parameters:
A random regression forest regr.randomForest with 500 trees is created.
The standard error of a prediction (if required by the infill criterion) is estimated by computing the jackknife-after-bootstrap. This is the se.method = "jackknife" option of the regr.randomForest Learner.
If additionally dependencies are in present in the parameter space, inactive conditional parameters
are represented by missing NA values in the training design data.frame.
We simply handle those with an imputation method, added to the random forest:
If a numeric value is inactive, i.e., missing, it will be imputed by 2 times the maximum of observed values
If a categorical value is inactive, i.e., missing, it will be imputed by the special class label "__miss__"
Both of these techniques make sense for tree-based methods and are usually hard to beat, see
Ding et.al. (2010).
makeMBOLearner(control, fun, config = list(),...)
Arguments
control: [MBOControl]
Control object for mbo.
fun: [smoof_function]
The same objective function which is also passed to mbo.
config: [named list]
Named list of config option to overwrite global settings set via configureMlr for this specific learner.
...: [any]
Further parameters passed to the constructed learner. Will overwrite mlrMBO's defaults.
Returns
[Learner]
Description
This is a helper function that generates a default surrogate, based on properties of the objective function and the selected infill criterion.
For numeric-only (including integers) parameter spaces without any dependencies:
A Kriging model regr.km with kernel matern3_2 is created.
If the objective function is deterministic we add a small nugget effect (10^-8*Var(y), y is vector of observed outcomes in current design) to increase numerical stability to hopefully prevent crashes of DiceKriging.
If the objective function is noisy the nugget effect will be estimated with nugget.estim = TRUE (but you can override this in ....
Also jitter is set to TRUE to circumvent a problem with DiceKriging where already trained input values produce the exact trained output. For further information check the $note slot of the created learner.
Instead of the default "BFGS" optimization method we use rgenoud ("gen"), which is a hybrid algorithm, to combine global search based on genetic algorithms and local search based on gradients. This may improve the model fit and will less frequently produce a constant surrogate model. You can also override this setting in ....
For mixed numeric-categorical parameter spaces, or spaces with conditional parameters:
A random regression forest regr.randomForest with 500 trees is created.
The standard error of a prediction (if required by the infill criterion) is estimated by computing the jackknife-after-bootstrap. This is the se.method = "jackknife" option of the regr.randomForest Learner.
If additionally dependencies are in present in the parameter space, inactive conditional parameters
are represented by missing NA values in the training design data.frame.
We simply handle those with an imputation method, added to the random forest:
If a numeric value is inactive, i.e., missing, it will be imputed by 2 times the maximum of observed values
If a categorical value is inactive, i.e., missing, it will be imputed by the special class label "__miss__"
Both of these techniques make sense for tree-based methods and are usually hard to beat, see
Ding et.al. (2010).
References
Ding, Yufeng, and Jeffrey S. Simonoff. An investigation of missing data methods for classification trees applied to binary response data. Journal of Machine Learning Research 11.Jan (2010): 131-170.