census dataset

Census Data Example from UC Irvine Machine Learning Repository

Includes a data frame of 1994 US census income from 48,842 people divided into a training set of 32,561 and an independent test set of 16,281. The training outcome variable y (yt for test) is binary and indicates whether or not a person’s income is greater than $50,000 per year. There are 12 predictor variables x (xt

for test) consisting of various demographic and financial properties associated with each person. It also included estimates of $Pr(y=1|x)$ obtained by several machine learning methods: gradient boosting on logistic scale using maximum likelihood (GBL), random forest (RF), and gradient boosting on the probability scale (GBP) using least–squares. data

Format

`census`

A list of 10 items.

x: training data frame of 32561 observations on 12 predictor variables
y: training binary response whether salary is above $50K or not
xt: test data frame of 16281 observations predictor variables
yt: test binary response whether salary is above $50K or not
gbl: training GBL response variable
gblt: test GBL response variable
gbp: training GBP response variable
gbpt: test GBP response variable
rf: training RF response probabilities
rft: test GBP response probabilities

Source

https://archive.ics.uci.edu/ml/datasets/census+income


census

conTree package Read PDF manual

Maintainer: Balasubramanian Narasimhan
License: Apache License 2.0
Last published: 2023-11-22

About the dataset

Number of columns: 10
Class: list

Column names and types

x:data.frame
y:numeric
xt:data.frame
yt:numeric
gbl:numeric
gblt:numeric
gbp:numeric
gbpt:numeric
rf:matrixvotes
rft:matrixvotes