Winsorization of outliers according to the Mahalanobis distance followed by an imputation under the multivariate normal model. Only the outliers are winsorized. The Mahalanobis distance MDmiss allows for missing values.
Winsimp(data, center, scatter, outind, seed =1000003)
Arguments
data: a data frame with the data.
center: (robust) estimate of the center (location) of the observations.
scatter: (robust) estimate of the scatter (covariance-matrix) of the observations.
outind: logical vector indicating outliers with 1 or TRUE for outliers.
seed: seed for random number generator.
Returns
Winsimp returns a list whose first component output is a sublist with the following components:
cutpoint: Cutpoint for outliers
proc.time: Processing time
n.missing.before: Number of missing values before imputation
n.missing.after: Number of missing values after imputation
The further component returned by winsimp is:
imputed.data: Imputed data set
Details
It is assumed that center, scatter and outind
stem from a multivariate outlier detection algorithm which produces robust estimates and which declares outliers observations with a large Mahalanobis distance. The cutpoint is calculated as the least (unsquared) Mahalanobis distance among the outliers. The winsorization reduces the weight of the outliers:
y^i=μR+(yi−μR)⋅c/di
where μR is the robust center and di is the (unsquared) Mahalanobis distance of observation i.
Hulliger, B. (2007), Multivariate Outlier Detection and Treatment in Business Surveys, Proceedings of the III International Conference on Establishment Surveys, Montréal.