Uses a Multinomial calibration model to calculate new probabilities
Uses a Multinomial calibration model to calculate new probabilities
cal_estimate_multinomial( .data, truth =NULL, estimate = dplyr::starts_with(".pred_"), smooth =TRUE, parameters =NULL,...)## S3 method for class 'data.frame'cal_estimate_multinomial( .data, truth =NULL, estimate = dplyr::starts_with(".pred_"), smooth =TRUE, parameters =NULL,..., .by =NULL)## S3 method for class 'tune_results'cal_estimate_multinomial( .data, truth =NULL, estimate = dplyr::starts_with(".pred_"), smooth =TRUE, parameters =NULL,...)## S3 method for class 'grouped_df'cal_estimate_multinomial( .data, truth =NULL, estimate =NULL, smooth =TRUE, parameters =NULL,...)
Arguments
.data: An ungrouped data.frame object, or tune_results object, that contains predictions and probability columns.
truth: The column identifier for the true class results (that is a factor). This should be an unquoted column name.
estimate: A vector of column identifiers, or one of dplyr selector functions to choose which variables contains the class probabilities. It defaults to the prefix used by tidymodels (.pred_). The order of the identifiers will be considered the same as the order of the levels of the truth variable.
smooth: Applies to the logistic models. It switches between logistic spline when TRUE, and simple logistic regression when FALSE.
parameters: (Optional) An optional tibble of tuning parameter values that can be used to filter the predicted values before processing. Applies only to tune_results objects.
...: Additional arguments passed to the models or routines used to calculate the new probabilities.
.by: The column identifier for the grouping variable. This should be a single unquoted column name that selects a qualitative variable for grouping. Default to NULL. When .by = NULL no grouping will take place.
Details
When smooth = FALSE, nnet::multinom() function is used to estimate the model, otherwise mgcv::gam() is used.
Examples
library(modeldata)library(parsnip)library(dplyr)f <- list(~-0.5+0.6* abs(A),~ ifelse(A >0& B >0,1.0+0.2* A / B,-2),~-0.6* A +0.50* B - A * B
)set.seed(1)tr_dat <- sim_multinomial(500, eqn_1 = f[[1]], eqn_2 = f[[2]], eqn_3 = f[[3]])cal_dat <- sim_multinomial(500, eqn_1 = f[[1]], eqn_2 = f[[2]], eqn_3 = f[[3]])te_dat <- sim_multinomial(500, eqn_1 = f[[1]], eqn_2 = f[[2]], eqn_3 = f[[3]])set.seed(2)rf_fit <- rand_forest()%>% set_mode("classification")%>% set_engine("randomForest")%>% fit(class ~ ., data = tr_dat)cal_pred <- predict(rf_fit, cal_dat, type ="prob")%>% bind_cols(cal_dat)te_pred <- predict(rf_fit, te_dat, type ="prob")%>% bind_cols(te_dat)cal_plot_windowed(cal_pred, truth = class, window_size =0.1, step_size =0.03)smoothed_mn <- cal_estimate_multinomial(cal_pred, truth = class)new_test_pred <- cal_apply(te_pred, smoothed_mn)cal_plot_windowed(new_test_pred, truth = class, window_size =0.1, step_size =0.03)