Builds selected calibration models on the supplied trainings values actual and predicted and returns them to the user. New test instances can be calibrated using the predict_calibratR function. Returns cross-validated calibration and discrimination error values for the models if evaluate_CV_error is set to TRUE. Repeated cross-Validation can be time-consuming.
model_idx: which calibration models should be implemented, 1=hist_scaled, 2=hist_transformed, 3=BBQ_scaled, 4=BBQ_transformed, 5=GUESS, Default: c(1, 2, 3, 4, 5)
evaluate_no_CV_error: computes internal errors for calibration models that were trained on all available actual/predicted tuples. Testing is performed with the same set. Be careful to interpret those error values, as they are not cross-validated. Default: TRUE
evaluate_CV_error: computes cross-validation error. folds times cross validation is repeated n_seeds times with changing seeds. The trained models and the their calibration and discrimination errors are returned. Evaluation of CV errors can take some time to compute, depending on the number of repetitions specified in n_seeds, Default: TRUE
folds: number of folds in the cross-validation of the calibration model. If folds is set to 1, no CV is performed and summary_CV can be calculated. Default: 10
n_seeds: n_seeds determines how often random data set partition is repeated with varying seed. If folds is 1, n_seeds should be set to 1, too. Default: 30
nCores: nCores how many cores should be used during parallelisation. Default: 4
Returns
A list object with the following components: - calibration_models: a list of all trained calibration models, which can be used in the predict_calibratR method.
summary_CV: a list containing information on the CV errors of the implemented models
summary_no_CV: a list containing information on the internal errors of the implemented models
predictions: calibrated predictions for the original predicted values
n_seeds: number of random data set partitions into training and test set for folds-times CV
Details
parallised execution of random data set splits for the Cross-Validation procedure over n_seeds