bank dataset

Bank Marketing

Bank Marketing

Direct marketing campaigns (phone calls) of a Portuguese banking institution to make clients subscribe a term deposit. data

data(bank)

Format

The data contains 41188 observations and 19 variables. See the UCI Machine Learning Repository for details.

Note

The data set has been pre-processed as in Zafar et al. (2019), with the following exceptions:

  • the variable duration has been dropped in order to learn as realistic predictive model;

  • the variable pdays has been dropped because it is not defined for the vast majority of samples;

  • observations where loan is "unknown" have been dropped because the corresponding regression coefficient estimated by glm()

    is NA;

  • the three observations where default is "yes" have been dropped to avoid errors in cross-validation (if all those three observations are in the test fold it is impossible to compute predictions from them).

In that paper, subscribed is the response variable, age is the sensitive attribute and the remaining variables are used as predictors.

The data contains the following variables:

  • age as a numeric variable;

  • job, a factor with 12 levels ranging from "blue-collar"

    to "services";

  • marital, a factor with levels "divorced", "married", "single" and "unknown";

  • education, a factor with 8 levels ranging from "basic.4y" to "university.degree";

  • default, a factor with levels "no" and "unknown";

  • housing, a factor with levels "yes" and "no";

  • loan, a factor with levels "yes" and "no";

  • contact, a factor with levels "cellular" and "telephone";

  • month, a factor with 12 levels for the months of the year;

  • day_of_week, a factor with 7 levels for the days of the week;

  • campaign, the number of contacts performed during this campaign;

  • previous, the number of contacts performed before this campaign;

  • poutcome, a factor with levels "failure", "nonexistent" and "success";

  • emp_var_rate, the (numeric) quarterly employment variation rate;

  • cons_price_idx, the (numeric) monthly consumer price index;

  • cons_conf_idx, the (numeric) monthly consumer confidence index;

  • euribor3m, the (numeric) euribor 3-month rate;

  • nr_employed, a numeric variable with the number of employees in the company in that quarter;

  • subscribed, a factor with levels "yes" and "no".

References

UCI Machine Learning Repository.

https://archive.ics.uci.edu/ml/datasets/bank+marketing

Examples

data(bank) # remove loans with unknown status, the corresponding coefficient is NA in glm(). bank = bank[bank$loan != "unknown", ] # short-hand variable names. r = bank[, "subscribed"] s = bank[, c("age")] p = bank[, setdiff(names(bank), c("subscribed", "age"))] m = zlrm(response = r, sensitive = s, predictors = p, unfairness = 0.05) summary(m)
  • Maintainer: Marco Scutari
  • License: MIT + file LICENSE
  • Last published: 2023-05-13

About the dataset

  • Number of rows: 40195
  • Number of columns: 19
  • Class: data.frame

Column names and types (First 10)

  • age:numeric
  • job:factor
  • marital:factor
  • education:factor
  • default:factor
  • housing:factor
  • loan:factor
  • contact:factor
  • month:factor
  • day_of_week:factor