imputation function

Missing data imputation (e.g. substitution by value or hotdeck method).

Missing data imputation (e.g. substitution by value or hotdeck method).

imputation(imethod = "value", D, Attribute = NULL, Missing = NA, Value = 1)

Arguments

  • imethod: imputation method type:

    • value -- substitutes missing data by Value (with single element or several elements);
    • hotdeck -- searches first the most similar example (i.e. using a k-nearest neighbor method -- knn) in the dataset and replaces the missing data by the value found in such example;
  • D: dataset with missing data (data.frame)

  • Attribute: if NULL then all attributes (data columns) with missing data are replaced. Else, Attribute is the attribute number (numeric) or name (character).

  • Missing: missing data symbol

  • Value: the substitution value (if imethod=value) or number of neighbors (k of knn).

Details

Check the references.

Returns

A data.frame without missing data.

References

  • M. Brown and J. Kros.

    Data mining and the impact of missing data.

    In Industrial Management & Data Systems, 103(8):611-621, 2003.

  • This tutorial shows additional code examples:

    P. Cortez.

    A tutorial on using the rminer R package for data mining tasks.

    Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes, Portugal, July 2015.

    http://hdl.handle.net/1822/36210

Author(s)

Paulo Cortez http://www3.dsi.uminho.pt/pcortez/

See Also

fit and delevels.

Note

See also http://hdl.handle.net/1822/36210 and http://www3.dsi.uminho.pt/pcortez/rminer.html

Examples

d=matrix(ncol=5,nrow=5) d[1,]=c(5,4,3,2,1) d[2,]=c(4,3,4,3,4) d[3,]=c(1,1,1,1,1) d[4,]=c(4,NA,3,4,4) d[5,]=c(5,NA,NA,2,1) d=data.frame(d); d[,3]=factor(d[,3]) print(d) print(imputation("value",d,3,Value="3")) print(imputation("value",d,2,Value=median(na.omit(d[,2])))) print(imputation("value",d,2,Value=c(1,2))) print(imputation("hotdeck",d,"X2",Value=1)) print(imputation("hotdeck",d,Value=1)) ## Not run: # hotdeck 1-nearest neighbor substitution on a real dataset: require(kknn) d=read.table( file="http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data", sep=",",na.strings="?",stringsAsFactors=TRUE) print(summary(d)) d2=imputation("hotdeck",d,Value=1) print(summary(d2)) par(mfrow=c(2,1)) hist(d$V26) hist(d2$V26) par(mfrow=c(1,1)) # reset mfrow ## End(Not run)