x: Numeric vector that will be searched for outliers.
k: Nonnegative constant that determines the extension of the 'whiskers'. Commonly used values are 1.5 (default), 2, or 3. Note that when method="adjbox" then k is set automatically equal to 1.5
method: Character, identifies the method to be used: method="resistant" provides the standard' boxplot fences; method="asymmetric"is a modification of standard method to deal with (moderately) skewed data;method="adjbox"` uses Hubert and Vandervieren (2008) adjusted boxplot for skewed distributions.
weights: Optional numeric vector with units' weights associated to the observations in x. Only nonnegative weights are allowed. Weights are used in estimating the quartiles (see Details).
id: Optional vector with identifiers of units in x. If missing (id=NULL, default) the identifiers will be set equal to the positions in the vector (i.e. id=1:length(x)).
exclude: Values of x that will be excluded by the analysis. By default missing values are excluded (exclude=NA).
logt: Logical, if TRUE, before searching outliers the x variable is log-transformed (log(x+1) is considered). In this case the summary outputs (bounds, etc.) will refer to the log-transformed x
Details
When method="resistant" the outlying observations are those outside the interval:
[Q1−k×IQR;Q3+k×IQR][Q1−k∗IQR;Q3+k∗IQR]
where Q1 and Q3 are respectively the 1st and the 3rd quartile of x, while IQR=(Q3−Q1) is the Inter-Quartile Range. The value k=1.5 (said 'inner fences') is commonly used when drawing a boxplot. Values k=2 and k=3 provide middle and outer fences, respectively.
When method="asymmetric" the outlying observations are those outside the interval:
being Q2 the median; such a modification allows to account for slight skewness of the distribution.
Finally, when method="adjbox" the outlying observations are identified using the method proposed by Hubert and Vandervieren (2008) and based on the Medcouple measure of skewness; in practice the bounds are:
Where M is the medcouple; when M>0 (positive skewness) then a=−4 and b=3; on the contrary a=−3 and b=4 for negative skewness (M<0). This adjustment of the boxplot, according to Hubert and Vandervieren (2008), works with moderate skewness (−0.6<=M<=0.6). The bounds of the adjusted boxplot are derived by applying the function adjboxStats in the package robustbase.
When weights are available (passed via the argument weights) then they are used in the computation of the quartiles. In particular, the quartiles are derived using the function wtd.quantile in the package Hmisc.
Remember that when asking a log transformation (argument logt=TRUE) all the estimates (quartiles, etc.) will refer to log(x+1).
Returns
The output is a list containing the following components:
quartiles: The quartiles of x after discarding the values in the exclude argument. When weights are provided they are used in quartiles estimation trough the function wtd.quantile in the package Hmisc.
fences: The bounds of the interval, values outside the interval are detected as outliers.
excluded: The identifiers or positions (when id=NULL) of units in x excluded by the computations, according to the argument exclude.
outliers: The identifiers or positions (when id=NULL) of units in x detected as outliers.
lowOutl: The identifiers or positions (when id=NULL) of units in x detected as outliers in the lower tail of the distribution.
upOutl: The identifiers or positions (when id=NULL) of units in x detected as outliers in the upper tail of the distribution.
References
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) `Variations of box plots'. The American Statistician, 32, pp. 12-16.
Hubert, M., and Vandervieren, E. (2008) `An Adjusted Boxplot for Skewed Distributions', Computational Statistics and Data Analysis, 52, pp. 5186-5201.