calibrate function

Calibration of probabilities according to the given prior.

Calibration of probabilities according to the given prior.

Given probability scores predictedProb as provided for example by a call to predict.CoreModel

and using one of available methods given by methods the function calibrates predicted probabilities so that they match the actual probabilities of a binary class 1 provided by correctClass. The computed calibration can be applied to the scores returned by that model.

calibrate(correctClass, predictedProb, class1=1, method = c("isoReg","binIsoReg","binning","mdlMerge"), weight=NULL, noBins=10, assumeProbabilities=FALSE) applyCalibration(predictedProb, calibration)

Arguments

  • correctClass: A vector of correct class labels for a binary classification problem.
  • predictedProb: A vector of predicted class 1 (probability) scores. In calibrate method it should be of the same length as correctClass.
  • class1: A class value (factor) or an index of the class value to be taken as a class to be calibrated.
  • method: One of isoReg, binIsoReg, binning, or mdlMerge. See details below.
  • weight: If specified, should be of the same length as correctClass and gives the weights for all the instances, otherwise a default weight of 1 for each instance is assumed.
  • noBins: The value of parameter depends on the parameter method and specifies desired or initial number of bins. See details below.
  • assumeProbabilities: If assumeProbabilities=TRUE the values in predictedProb are expected to be in [0,1] range i.e., probability estimates. assumeProbabilities=FALSE the algorithm can be used as ordinary (isotonic) regression
  • calibration: The list resulting from a call to calibration and subsequently applied to probability scores returned by the same model.

Details

Depending on the specified method one of the following calibration methods is executed.

  • "isoReg" isotonic regression calibration based on pair-adjacent violators (PAV) algorithm.
  • "binning" calibration into a pre-specified number of bands given by noBins parameter, trying to make bins of equal weight.
  • "binIsoReg" first binning method is executed, following by a isotonic regression calibration.
  • "mdlMerge" first intervals are merged by a MDL gain criterion into a prespecified number of intervals, following by the isotonic regression calibration.

If model="binning" the parameter noBins specifies the desired number of bins i.e., calibration bands; if model="binIsoReg" the parameter noBins specifies the number of initial bins that are formed by binning before isotonic regression is applied; if model="mdlMerge" the parameter noBins specifies the number of bins formed after first applying isotonic regression. The most similar bins are merged using MDL criterion.

Returns

A function returns a list with two vector components of the same length: - interval: The boundaries of the intervals. Lower boundary 0 is not explicitly included but should be taken into account.

  • calProb: The calibrated probabilities for each corresponding interval.

Author(s)

Marko Robnik-Sikonja

See Also

reliabilityPlot, CORElearn, predict.CoreModel

.

References

I. Kononenko, M. Kukar: Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood, 2007

A. Niculescu-Mizil, R. Caruana: Predicting Good Probabilities With Supervised Learning. Proceedings of the 22nd International Conference on Machine Learning (ICML'05), 2005

Examples

# generate data set separately for training the model, # calibration of probabilities and testing train <-classDataGen(noInst=200) cal <-classDataGen(noInst=200) test <- classDataGen(noInst=200) # build random forests model with default parameters modelRF <- CoreModel(class~., train, model="rf", maxThreads=1) # prediction predCal <- predict(modelRF, cal, rfPredictClass=FALSE) predTest <- predict(modelRF, test, rfPredictClass=FALSE) destroyModels(modelRF) # clean up, model not needed anymore # calibrate for a chosen class1 and method class1<-1 calibration <- calibrate(cal$class, predCal$prob[,class1], class1=class1, method="isoReg",assumeProbabilities=TRUE) # apply the calibration to the testing set calibratedProbs <- applyCalibration(predTest$prob[,class1], calibration) # the calibration of probabilities can be visualized with # reliabilityPlot function