Synthetic data set used as test cases in the fairml package.
data
data(vu.test)
Format
The data are stored a list with following three elements:
gaussian, binomial, poisson, coxph and multinomial are response variables for the different families;
X, a numeric matrix containing 3 predictors called X1, X2 and X3;
S, a numeric matrix containing 3 sensitive attributes called S1, S2 and S3.
Note
This data set is called vu.test because it is generated from very unfair models in which sensitive attributes explain the lion's share of the overall explained variance or deviance.
The code used to generate the predictors and the sensitive attributes is as follows.
library(mvtnorm)
sigma = matrix(0.3, nrow = 6, ncol = 6)
diag(sigma) = 1
n = 1000
X = rmvnorm(n, mean = rep(0, 6), sigma = sigma)
S = X[, 4:6]
X = X[, 1:3]
colnames(X) = c("X1", "X2", "X3")
colnames(S) = c("S1", "S2", "S3")
The continuous response in gaussian is produced as follows.