Black-box models may have very different structures. This function creates a unified representation of a model, which can be further processed by functions for explanations.
explain.default( model, data =NULL, y =NULL, predict_function =NULL, predict_function_target_column =NULL, residual_function =NULL, weights =NULL,..., label =NULL, verbose =TRUE, precalculate =TRUE, colorize =!isTRUE(getOption("knitr.in.progress")), model_info =NULL, type =NULL)explain( model, data =NULL, y =NULL, predict_function =NULL, predict_function_target_column =NULL, residual_function =NULL, weights =NULL,..., label =NULL, verbose =TRUE, precalculate =TRUE, colorize =!isTRUE(getOption("knitr.in.progress")), model_info =NULL, type =NULL)
Arguments
model: object - a model to be explained
data: data.frame or matrix - data which will be used to calculate the explanations. If not provided, then it will be extracted from the model. Data should be passed without a target column (this shall be provided as the y argument). NOTE: If the target variable is present in the data, some of the functionalities may not work properly.
y: numeric vector with outputs/scores. If provided, then it shall have the same size as data
predict_function: function that takes two arguments: model and new data and returns a numeric vector with predictions. By default it is yhat.
predict_function_target_column: Character or numeric containing either column name or column number in the model prediction object of the class that should be considered as positive (i.e. the class that is associated with probability 1). If NULL, the second column of the output will be taken for binary classification. For a multiclass classification setting, that parameter cause switch to binary classification mode with one vs others probabilities.
residual_function: function that takes four arguments: model, data, target vector y and predict function (optionally). It should return a numeric vector with model residuals for given data. If not provided, response residuals (y−y^) are calculated. By default it is residual_function_default.
weights: numeric vector with sampling weights. By default it's NULL. If provided, then it shall have the same length as data
...: other parameters
label: character - the name of the model. By default it's extracted from the 'class' attribute of the model
verbose: logical. If TRUE (default) then diagnostic messages will be printed
precalculate: logical. If TRUE (default) then predicted_values and residual are calculated when explainer is created. This will happen also if verbose is TRUE. Set both verbose and precalculate to FALSE to omit calculations.
colorize: logical. If TRUE (default) then WARNINGS, ERRORS and NOTES are colorized. Will work only in the R console. Now by default it is FALSE while knitting and TRUE otherwise.
model_info: a named list (package, version, type) containing information about model. If NULL, DALEX will seek for information on it's own.
type: type of a model, either classification or regression. If not specified then type will be extracted from model_info.
Returns
An object of the class explainer.
It's a list with the following fields:
model the explained model.
data the dataset used for training.
y response for observations from data.
weights sample weights for data. NULL if weights are not specified.
y_hat calculated predictions.
residuals calculated residuals.
predict_function function that may be used for model predictions, shall return a single numerical value for each observation.
residual_function function that returns residuals, shall return a single numerical value for each observation.
class class/classes of a model.
label label of explainer.
model_info named list contating basic information about model, like package, version of package and type.
Details
Please NOTE that the model is the only required argument. But some explanations may expect that other arguments will be provided too.
Examples
# simple explainer for regression problemaps_lm_model4 <- lm(m2.price ~., data = apartments)aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label ="model_4v")aps_lm_explainer4
# various parameters for the explain function# all defaultsaps_lm <- explain(aps_lm_model4)# silent executionaps_lm <- explain(aps_lm_model4, verbose =FALSE)# set target variableaps_lm <- explain(aps_lm_model4, data = apartments, label ="model_4v", y = apartments$m2.price)aps_lm <- explain(aps_lm_model4, data = apartments, label ="model_4v", y = apartments$m2.price, predict_function = predict)# user provided predict_functionaps_ranger <- ranger::ranger(m2.price~., data = apartments, num.trees =50)custom_predict <-function(X.model, newdata){ predict(X.model, newdata)$predictions
}aps_ranger_exp <- explain(aps_ranger, data = apartments, y = apartments$m2.price, predict_function = custom_predict)# user provided residual_functionaps_ranger <- ranger::ranger(m2.price~., data = apartments, num.trees =50)custom_residual <-function(X.model, newdata, y, predict_function){ abs(y - predict_function(X.model, newdata))}aps_ranger_exp <- explain(aps_ranger, data = apartments, y = apartments$m2.price, residual_function = custom_residual)# binary classificationtitanic_ranger <- ranger::ranger(as.factor(survived)~., data = titanic_imputed, num.trees =50, probability =TRUE)# keep in mind that for binary classification y parameter has to be numeric with 0 and 1 valuestitanic_ranger_exp <- explain(titanic_ranger, data = titanic_imputed, y = titanic_imputed$survived)# multiclass taskhr_ranger <- ranger::ranger(status~., data = HR, num.trees =50, probability =TRUE)# keep in mind that for multiclass y parameter has to be a factor,# with same levels as in training datahr_ranger_exp <- explain(hr_ranger, data = HR, y = HR$status)# set model_infomodel_info <- list(package ="stats", ver ="3.6.2", type ="regression")aps_lm_model4 <- lm(m2.price ~., data = apartments)aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label ="model_4v", model_info = model_info)# simple functionaps_fun <-function(x)58*x$surface
aps_fun_explainer <- explain(aps_fun, data = apartments, y = apartments$m2.price, label="sfun")model_performance(aps_fun_explainer)# set model_infomodel_info <- list(package ="stats", ver ="3.6.2", type ="regression")aps_lm_model4 <- lm(m2.price ~., data = apartments)aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label ="model_4v", model_info = model_info)aps_lm_explainer4 <- explain(aps_lm_model4, data = apartments, label ="model_4v", weights = as.numeric(apartments$construction.year >2000))# more complex modellibrary("ranger")aps_ranger_model4 <- ranger(m2.price ~., data = apartments, num.trees =50)aps_ranger_explainer4 <- explain(aps_ranger_model4, data = apartments, label ="model_ranger")aps_ranger_explainer4
References
Explanatory Model Analysis. Explore, Explain and Examine Predictive Models. https://ema.drwhy.ai/