OutlierSign2 function

Outlier identification in high dimensions using the SIGN2 algorithm

Outlier identification in high dimensions using the SIGN2 algorithm

Fast algorithm for identifying multivariate outliers in high-dimensional and/or large datasets, using spatial signs, see Filzmoser, Maronna, and Werner (CSDA, 2007). The computation of the distances is based on principal components.

OutlierSign2(x, ...) ## Default S3 method: OutlierSign2(x, grouping, qcrit = 0.975, explvar=0.99, trace=FALSE, ...) ## S3 method for class 'formula' OutlierSign2(formula, data, ..., subset, na.action)

Arguments

  • formula: a formula with no response variable, referring only to numeric variables.
  • data: an optional data frame (or similar: see model.frame) containing the variables in the formula formula.
  • subset: an optional vector used to select rows (observations) of the data matrix x.
  • na.action: a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.
  • ...: arguments passed to or from other methods.
  • x: a matrix or data frame.
  • grouping: grouping variable: a factor specifying the class for each observation.
  • explvar: a numeric value between 0 and 1 indicating how much variance should be covered by the robust PCs. Default is 0.99.
  • qcrit: a numeric value between 0 and 1 indicating the quantile to be used as critical value for outlier detection. Default is 0.975.
  • trace: whether to print intermediate results. Default is trace = FALSE

Details

Based on the robustly sphered and normed data, robust principal components are computed which are needed for determining distances for each observation. The distances are transformed to approach chi-square distribution, and a critical value is then used as outlier cutoff.

Returns

An S4 object of class OutlierSign2 which is a subclass of the virtual class Outlier.

References

P. Filzmoser, R. Maronna and M. Werner (2008). Outlier identification in high dimensions, Computational Statistics & Data Analysis, Vol. 52 1694--1711.

Filzmoser P & Todorov V (2013). Robust tools for the imperfect world, Information Sciences 245 , 4--20. tools:::Rd_expr_doi("10.1016/j.ins.2012.10.017") .

Author(s)

Valentin Todorov valentin.todorov@chello.at

See Also

OutlierSign2, OutlierSign1, Outlier

Examples

data(hemophilia) obj <- OutlierSign2(gr~.,data=hemophilia) obj getDistance(obj) # returns an array of distances getClassLabels(obj, 1) # returns an array of indices for a given class getCutoff(obj) # returns an array of cutoff values (for each class, usually equal) getFlag(obj) # returns an 0/1 array of flags plot(obj, class=2) # standard plot function
  • Maintainer: Valentin Todorov
  • License: GPL (>= 3)
  • Last published: 2024-08-17