Find and graph Mahalanobis squared distances to detect outliers
Find and graph Mahalanobis squared distances to detect outliers
The Mahalanobis distance is D2=(x−μ)′Σ−1(x−μ) where Σ is the covariance of the x matrix. D2 may be used as a way of detecting outliers in distribution. Large D2 values, compared to the expected Chi Square values indicate an unusual response pattern. The mahalanobis function in stats does not handle missing data.
outlier(x, plot =TRUE, bad =5,na.rm =TRUE, xlab, ylab,...)
Arguments
x: A data matrix or data.frame
plot: Plot the resulting QQ graph
bad: Label the bad worst values
na.rm: Should missing data be deleted
xlab: Label for x axis
ylab: Label for y axis
...: More graphic parameters, e.g., cex=.8
Details
Adapted from the mahalanobis function and help page from stats.
Returns
The D2 values for each case
References
Yuan, Ke-Hai and Zhong, Xiaoling, (2008) Outliers, Leverage Observations, and Influential Cases in Factor Analysis: Using Robust Procedures to Minimize Their Effect, Sociological Methodology, 38, 329-368.
Author(s)
William Revelle
See Also
mahalanobis
Examples
#first, just find and graph the outliersd2 <- outlier(sat.act)#combine with the data frame and plot it with the outliers highlighted in bluesat.d2 <- data.frame(sat.act,d2)pairs.panels(sat.d2,bg=c("yellow","blue")[(d2 >25)+1],pch=21)