pl: logical indicating if the function should plot the Reliability diagram and histogram of the calibrations
Returns
list with samples, responses, calibrations, ECE, MCE and calibration plots if save==T
Details
Many popular machine learning algorithms produce inaccurate predicted probabilities, especially when applied on a dataset different than the training set. Platt (1999) proposed an adjustment, in which the original probabilities are used as a predictor in a single-variable logistic regression to produce more accurate adjusted predicted probabilities. The function will also help the evaluation of the calibration, by plotting: reliability diagrams and distributions of the calibrated and non-calibrated probabilities. The reliability diagrams plots the mean predicted value within a certain range of posterior probabilities, against the fraction of accurately predicted values. Finally, we also report accuracy measures for the calibrations: the ECE, MCE and the Log-Loss of the probabilities before and after calibration.
Examples
library(stats)library(plotly)#load the datasetmet <- synthetic_metabolic_dataset
phen <- synthetic_phenotypic_dataset
#Calculating the binarized surrogatesb_phen<-binarize_all_pheno(phen)#Apply a surrogate models and plot the ROC curvesurr<-calculate_surrogate_scores(met, phen,MiMIR::PARAM_surrogates, bin_names=colnames(b_phen))#Calibration of the surrogate sexreal_data<-as.numeric(b_phen$sex)pred_data<-surr$surrogates[,"s_sex"]plattCalibration(r.calib=real_data, p.calib=pred_data, nbins =10, pl=TRUE)
References
This is a function originally created for the package in eRic, under the name prCalibrate and modified ad hoc for our purposes (Github)
J. C. Platt, 'Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods', in Advances in Large Margin Classifiers, 1999, pp. 61-74.