Uses a linear regression model to calibrate numeric predictions
Uses a linear regression model to calibrate numeric predictions
cal_estimate_linear( .data, truth =NULL, estimate = dplyr::matches("^.pred$"), smooth =TRUE, parameters =NULL,..., .by =NULL)## S3 method for class 'data.frame'cal_estimate_linear( .data, truth =NULL, estimate = dplyr::matches("^.pred$"), smooth =TRUE, parameters =NULL,..., .by =NULL)## S3 method for class 'tune_results'cal_estimate_linear( .data, truth =NULL, estimate = dplyr::matches("^.pred$"), smooth =TRUE, parameters =NULL,...)## S3 method for class 'grouped_df'cal_estimate_linear( .data, truth =NULL, estimate =NULL, smooth =TRUE, parameters =NULL,...)
Arguments
.data: Am ungrouped data.frame object, or tune_results object, that contains a prediction column.
truth: The column identifier for the observed outcome data (that is numeric). This should be an unquoted column name.
estimate: Column identifier for the predicted values
smooth: Applies to the linear models. It switches between a generalized additive model using spline terms when TRUE, and simple linear regression when FALSE.
parameters: (Optional) An optional tibble of tuning parameter values that can be used to filter the predicted values before processing. Applies only to tune_results objects.
...: Additional arguments passed to the models or routines used to calculate the new predictions.
.by: The column identifier for the grouping variable. This should be a single unquoted column name that selects a qualitative variable for grouping. Default to NULL. When .by = NULL no grouping will take place.
Details
This function uses existing modeling functions from other packages to create the calibration:
stats::glm() is used when smooth is set to FALSE
mgcv::gam() is used when smooth is set to TRUE
These methods estimate the relationship in the unmodified predicted values and then remove that trend when cal_apply() is invoked.
Examples
library(dplyr)library(ggplot2)head(boosting_predictions_test)# ------------------------------------------------------------------------------# Before calibrationy_rng <- extendrange(boosting_predictions_test$outcome)boosting_predictions_test %>% ggplot(aes(outcome, .pred))+ geom_abline(lty =2)+ geom_point(alpha =1/2)+ geom_smooth(se =FALSE, col ="blue", linewidth =1.2, alpha =3/4)+ coord_equal(xlim = y_rng, ylim = y_rng)+ ggtitle("Before calibration")# ------------------------------------------------------------------------------# Smoothed trend removalsmoothed_cal <- boosting_predictions_oob %>%# It will automatically identify the predicted value columns when the# standard tidymodels naming conventions are used. cal_estimate_linear(outcome)smoothed_cal
boosting_predictions_test %>% cal_apply(smoothed_cal)%>% ggplot(aes(outcome, .pred))+ geom_abline(lty =2)+ geom_point(alpha =1/2)+ geom_smooth(se =FALSE, col ="blue", linewidth =1.2, alpha =3/4)+ coord_equal(xlim = y_rng, ylim = y_rng)+ ggtitle("After calibration")