Iterative EM PCA imputation
Greedy algorithm for EM-PCA including robust methods
impPCA( x, method = "classical", m = 1, eps = 0.5, k = ncol(x) - 1, maxit = 100, boot = FALSE, verbose = TRUE )
x
: data.frame or matrixmethod
: "classical"
or "mcd"
(robust estimation)m
: number of multiple imputations (only if parameter boot
equals TRUE
)eps
: threshold for convergencek
: number of principal components for reconstruction of x
maxit
: maximum number of iterationsboot
: residual bootstrap (if TRUE
)verbose
: TRUE/FALSE if additional information about the imputation process should be printedthe imputed data set. If boot = FALSE
this is a data.frame. If boot = TRUE
this is a list where each list element contains a data.frame.
data(Animals, package = "MASS") Animals$brain[19] <- Animals$brain[19] + 0.01 Animals <- log(Animals) colnames(Animals) <- c("log(body)", "log(brain)") Animals_na <- Animals probs <- abs(Animals$`log(body)`^2) probs <- rep(0.5, nrow(Animals)) probs[c(6,16,26)] <- 0 set.seed(1234) Animals_na[sample(1:nrow(Animals), 10, prob = probs), "log(brain)"] <- NA w <- is.na(Animals_na$`log(brain)`) impPCA(Animals_na) impPCA(Animals_na, method = "mcd") impPCA(Animals_na, boot = TRUE, m = 10) impPCA(Animals_na, method = "mcd", boot = TRUE)[[1]] plot(`log(brain)` ~ `log(body)`, data = Animals, type = "n", ylab = "", xlab="") mtext(text = "impPCA robust", side = 3) points(Animals$`log(body)`[!w], Animals$`log(brain)`[!w]) points(Animals$`log(body)`[w], Animals$`log(brain)`[w], col = "grey", pch = 17) imputed <- impPCA(Animals_na, method = "mcd", boot = TRUE)[[1]] colnames(imputed) <- c("log(body)", "log(brain)") points(imputed$`log(body)`[w], imputed$`log(brain)`[w], col = "red", pch = 20, cex = 1.4) segments(x0 = Animals$`log(body)`[w], x1 = imputed$`log(body)`[w], y0 = Animals$`log(brain)`[w], y1 = imputed$`log(brain)`[w], lty = 2, col = "grey") legend("topleft", legend = c("non-missings", "set to missing", "imputed values"), pch = c(1,17,20), col = c("black","grey","red"), cex = 0.7) mape <- round(100* 1/sum(is.na(Animals_na$`log(brain)`)) * sum(abs((Animals$`log(brain)` - imputed$`log(brain)`) / Animals$`log(brain)`)), 2) s2 <- var(Animals$`log(brain)`) nrmse <- round(sqrt(1/sum(is.na(Animals_na$`log(brain)`)) * sum(abs((Animals$`log(brain)` - imputed$`log(brain)`) / s2))), 2) text(x = 8, y = 1.5, labels = paste("MAPE =", mape)) text(x = 8, y = 0.5, labels = paste("NRMSE =", nrmse))
Serneels, Sven and Verdonck, Tim (2008). Principal component analysis for data containing outliers and missing elements. Computational Statistics and Data Analysis, Elsevier, vol. 52(3), pages 1712-1727
Other imputation methods: hotdeck()
, irmi()
, kNN()
, matchImpute()
, medianSamp()
, rangerImpute()
, regressionImp()
, sampleCat()
Matthias Templ