Missing data imputation (e.g. substitution by value or hotdeck method).
Missing data imputation (e.g. substitution by value or hotdeck method).
imputation(imethod ="value", D, Attribute =NULL, Missing =NA, Value =1)
Arguments
imethod: imputation method type:
value -- substitutes missing data by Value (with single element or several elements);
hotdeck -- searches first the most similar example (i.e. using a k-nearest neighbor method -- knn) in the dataset and replaces the missing data by the value found in such example;
D: dataset with missing data (data.frame)
Attribute: if NULL then all attributes (data columns) with missing data are replaced. Else, Attribute is the attribute number (numeric) or name (character).
Missing: missing data symbol
Value: the substitution value (if imethod=value) or number of neighbors (k of knn).
Details
Check the references.
Returns
A data.frame without missing data.
References
M. Brown and J. Kros.
Data mining and the impact of missing data.
In Industrial Management & Data Systems, 103(8):611-621, 2003.
This tutorial shows additional code examples:
P. Cortez.
A tutorial on using the rminer R package for data mining tasks.
Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes, Portugal, July 2015.
d=matrix(ncol=5,nrow=5)d[1,]=c(5,4,3,2,1)d[2,]=c(4,3,4,3,4)d[3,]=c(1,1,1,1,1)d[4,]=c(4,NA,3,4,4)d[5,]=c(5,NA,NA,2,1)d=data.frame(d); d[,3]=factor(d[,3])print(d)print(imputation("value",d,3,Value="3"))print(imputation("value",d,2,Value=median(na.omit(d[,2]))))print(imputation("value",d,2,Value=c(1,2)))print(imputation("hotdeck",d,"X2",Value=1))print(imputation("hotdeck",d,Value=1))## Not run:# hotdeck 1-nearest neighbor substitution on a real dataset:require(kknn)d=read.table( file="http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data", sep=",",na.strings="?",stringsAsFactors=TRUE)print(summary(d))d2=imputation("hotdeck",d,Value=1)print(summary(d2))par(mfrow=c(2,1))hist(d$V26)hist(d2$V26)par(mfrow=c(1,1))# reset mfrow## End(Not run)