mlr_tasks_spam function

Spam Classification Task

Spam Classification Task

Spam data set from the UCI machine learning repository (http://archive.ics.uci.edu/dataset/94/spambase). Data set collected at Hewlett-Packard Labs to classify emails as spam or non-spam. 57 variables indicate the frequency of certain words and characters in the e-mail. The positive class is set to "spam".

Format

R6::R6Class inheriting from TaskClassif .

Source

Creators: Mark Hopkins, Erik Reeber, George Forman, Jaap Suermondt. Hewlett-Packard Labs, 1501 Page Mill Rd., Palo Alto, CA 94304

Donor: George Forman (gforman at nospam hpl.hp.com) 650-857-7835

Preprocessing: Columns have been renamed. Preprocessed data taken from the list("kernlab") package.

Dictionary

This Task can be instantiated via the dictionary mlr_tasks or with the associated sugar function tsk():

mlr_tasks$get("spam")
tsk("spam")

Meta Information

  • Task type: classif
  • Dimensions: 4601x58
  • Properties: twoclass
  • Has Missings: FALSE
  • Target: type
  • Features: address , addresses , all , business , capitalAve , capitalLong , capitalTotal , charDollar , charExclamation , charHash , charRoundbracket , charSemicolon , charSquarebracket , conference , credit , cs , data , direct , edu , email , font , free , george , hp , hpl , internet , lab , labs , mail , make , meeting , money , num000 , num1999 , num3d , num415 , num650 , num85 , num857 , order , original , our , over , parts , people , pm , project , re , receive , remove , report , table , technology , telnet , will , you , your

References

Dua, Dheeru, Graff, Casey (2017). UCI Machine Learning Repository.

http://archive.ics.uci.edu/datasets.

See Also

Other Task: Task, TaskClassif, TaskRegr, TaskSupervised, TaskUnsupervised, california_housing, mlr_tasks, mlr_tasks_breast_cancer, mlr_tasks_german_credit, mlr_tasks_iris, mlr_tasks_mtcars, mlr_tasks_penguins, mlr_tasks_pima, mlr_tasks_sonar, mlr_tasks_wine, mlr_tasks_zoo