Outlier identification in high dimensions using the SIGN2 algorithm
Outlier identification in high dimensions using the SIGN2 algorithm
Fast algorithm for identifying multivariate outliers in high-dimensional and/or large datasets, using spatial signs, see Filzmoser, Maronna, and Werner (CSDA, 2007). The computation of the distances is based on principal components.
OutlierSign2(x,...)## Default S3 method:OutlierSign2(x, grouping, qcrit =0.975, explvar=0.99, trace=FALSE,...)## S3 method for class 'formula'OutlierSign2(formula, data,..., subset, na.action)
Arguments
formula: a formula with no response variable, referring only to numeric variables.
data: an optional data frame (or similar: see model.frame) containing the variables in the formula formula.
subset: an optional vector used to select rows (observations) of the data matrix x.
na.action: a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.
...: arguments passed to or from other methods.
x: a matrix or data frame.
grouping: grouping variable: a factor specifying the class for each observation.
explvar: a numeric value between 0 and 1 indicating how much variance should be covered by the robust PCs. Default is 0.99.
qcrit: a numeric value between 0 and 1 indicating the quantile to be used as critical value for outlier detection. Default is 0.975.
trace: whether to print intermediate results. Default is trace = FALSE
Details
Based on the robustly sphered and normed data, robust principal components are computed which are needed for determining distances for each observation. The distances are transformed to approach chi-square distribution, and a critical value is then used as outlier cutoff.
Returns
An S4 object of class OutlierSign2 which is a subclass of the virtual class Outlier.
References
P. Filzmoser, R. Maronna and M. Werner (2008). Outlier identification in high dimensions, Computational Statistics & Data Analysis, Vol. 52 1694--1711.
Filzmoser P & Todorov V (2013). Robust tools for the imperfect world, Information Sciences 245 , 4--20. tools:::Rd_expr_doi("10.1016/j.ins.2012.10.017") .
data(hemophilia)obj <- OutlierSign2(gr~.,data=hemophilia)obj
getDistance(obj)# returns an array of distancesgetClassLabels(obj,1)# returns an array of indices for a given classgetCutoff(obj)# returns an array of cutoff values (for each class, usually equal)getFlag(obj)# returns an 0/1 array of flagsplot(obj, class=2)# standard plot function