Direct marketing campaigns (phone calls) of a Portuguese banking institution to make clients subscribe a term deposit.
data
data(bank)
Format
The data contains 41188 observations and 19 variables. See the UCI Machine Learning Repository for details.
Note
The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:
the variable duration has been dropped in order to learn as realistic predictive model;
the variable pdays has been dropped because it is not defined for the vast majority of samples;
observations where loan is "unknown" have been dropped because the corresponding regression coefficient estimated by glm()
is NA;
the three observations where default is "yes" have been dropped to avoid errors in cross-validation (if all those three observations are in the test fold it is impossible to compute predictions from them).
In that paper, subscribed is the response variable, age is the sensitive attribute and the remaining variables are used as predictors.
The data contains the following variables:
age as a numeric variable;
job, a factor with 12 levels ranging from "blue-collar"
to "services";
marital, a factor with levels "divorced", "married", "single" and "unknown";
education, a factor with 8 levels ranging from "basic.4y" to "university.degree";
default, a factor with levels "no" and "unknown";
housing, a factor with levels "yes" and "no";
loan, a factor with levels "yes" and "no";
contact, a factor with levels "cellular" and "telephone";
month, a factor with 12 levels for the months of the year;
day_of_week, a factor with 7 levels for the days of the week;
campaign, the number of contacts performed during this campaign;
previous, the number of contacts performed before this campaign;
poutcome, a factor with levels "failure", "nonexistent" and "success";
emp_var_rate, the (numeric) quarterly employment variation rate;
cons_price_idx, the (numeric) monthly consumer price index;
cons_conf_idx, the (numeric) monthly consumer confidence index;
euribor3m, the (numeric) euribor 3-month rate;
nr_employed, a numeric variable with the number of employees in the company in that quarter;