census dataset

Census Data Example from UC Irvine Machine Learning Repository

Census Data Example from UC Irvine Machine Learning Repository

Includes a data frame of 1994 US census income from 48,842 people divided into a training set of 32,561 and an independent test set of 16,281. The training outcome variable y (yt for test) is binary and indicates whether or not a person’s income is greater than $50,000 per year. There are 12 predictor variables x (xt

for test) consisting of various demographic and financial properties associated with each person. It also included estimates of Pr(y=1x)Pr(y=1|x) obtained by several machine learning methods: gradient boosting on logistic scale using maximum likelihood (GBL), random forest (RF), and gradient boosting on the probability scale (GBP) using least–squares. data

Format

census

A list of 10 items.

  • x: training data frame of 32561 observations on 12 predictor variables
  • y: training binary response whether salary is above $50K or not
  • xt: test data frame of 16281 observations predictor variables
  • yt: test binary response whether salary is above $50K or not
  • gbl: training GBL response variable
  • gblt: test GBL response variable
  • gbp: training GBP response variable
  • gbpt: test GBP response variable
  • rf: training RF response probabilities
  • rft: test GBP response probabilities

Source

https://archive.ics.uci.edu/ml/datasets/census+income

census
  • Maintainer: Balasubramanian Narasimhan
  • License: Apache License 2.0
  • Last published: 2023-11-22

About the dataset

  • Number of columns: 10
  • Class: list

Column names and types

  • x:data.frame
  • y:numeric
  • xt:data.frame
  • yt:numeric
  • gbl:numeric
  • gblt:numeric
  • gbp:numeric
  • gbpt:numeric
  • rf:matrixvotes
  • rft:matrixvotes