OutlierMahdist function

Outlier identification using robust (mahalanobis) distances based on robust multivariate location and covariance matrix

Outlier identification using robust (mahalanobis) distances based on robust multivariate location and covariance matrix

This function uses the Mahalanobis distance as a basis for multivariate outlier detection. The standard method for multivariate outlier detection is robust estimation of the parameters in the Mahalanobis distance and the comparison with a critical value of the Chi2 distribution (Rousseeuw and Van Zomeren, 1990).

OutlierMahdist(x, ...) ## Default S3 method: OutlierMahdist(x, grouping, control, trace=FALSE, ...) ## S3 method for class 'formula' OutlierMahdist(formula, data, ..., subset, na.action)

Arguments

  • formula: a formula with no response variable, referring only to numeric variables.
  • data: an optional data frame (or similar: see model.frame) containing the variables in the formula formula.
  • subset: an optional vector used to select rows (observations) of the data matrix x.
  • na.action: a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.
  • ...: arguments passed to or from other methods.
  • x: a matrix or data frame.
  • grouping: grouping variable: a factor specifying the class for each observation.
  • control: a control object (S4) for one of the available control classes, e.g. CovControlMcd-class, CovControlOgk-class, CovControlSest-class, etc., containing estimation options. The class of this object defines which estimator will be used. Alternatively a character string can be specified which names the estimator - one of auto, sde, mcd, ogk, m, mve, sfast, surreal, bisquare, rocke. If 'auto' is specified or the argument is missing, the function will select the estimator (see below for details)
  • trace: whether to print intermediate results. Default is trace = FALSE

Details

If the data set consists of two or more classes (specified by the grouping variable grouping) the proposed method iterates through the classes present in the data, separates each class from the rest and identifies the outliers relative to this class, thus treating both types of outliers, the mislabeled and the abnormal samples in a homogenous way.

The estimation method is selected by the control object control. If a character string naming an estimator is specified, a new control object will be created and used (with default estimation options). If this argument is missing or a character string 'auto' is specified, the function will select the robust estimator according to the size of the dataset - for details see CovRobust.

Returns

An S4 object of class OutlierMahdist which is a subclass of the virtual class Outlier.

References

P. J. Rousseeuw and B. C. Van Zomeren (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association. Vol. 85(411), pp. 633-651.

P. J. Rousseeuw and A. M. Leroy (1987). Robust Regression and Outlier Detection. Wiley.

P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41 , 212--223.

Todorov V & Filzmoser P (2009). An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32 (3), 1--47, tools:::Rd_expr_doi("10.18637/jss.v032.i03") .

Filzmoser P & Todorov V (2013). Robust tools for the imperfect world, Information Sciences 245 , 4--20. tools:::Rd_expr_doi("10.1016/j.ins.2012.10.017") .

Author(s)

Valentin Todorov valentin.todorov@chello.at

Examples

data(hemophilia) obj <- OutlierMahdist(gr~.,data=hemophilia) obj getDistance(obj) # returns an array of distances getClassLabels(obj, 1) # returns an array of indices for a given class getCutoff(obj) # returns an array of cutoff values (for each class, usually equal) getFlag(obj) # returns an 0/1 array of flags plot(obj, class=2) # standard plot function
  • Maintainer: Valentin Todorov
  • License: GPL (>= 3)
  • Last published: 2024-08-17