Compute the medcouple , a robust concept and estimator of skewness. The medcouple is defined as a scaled median difference of the left and right half of distribution, and hence not based on the third moment as the classical skewness.
mc(x, na.rm =FALSE, doReflect =(length(x)<=100), doScale =FALSE,# was hardwired=TRUE, then default=TRUE c.huberize =1e11,# was implicitly = Inf originally eps1 =1e-14, eps2 =1e-15,# << new in 0.93-2 (2018-07..) maxit =100, trace.lev =0, full.result =FALSE)
Arguments
x: a numeric vector
na.rm: logical indicating how missing values (NAs) should be dealt with.
doReflect: logical indicating if the internal MC should also be computed on the reflected sample -x, with final result (mc.(x) - mc.(-x))/2. This makes sense since the internal MC, mc.() computes the himedian() which can differ slightly from the median.
doScale: logical indicating if the internal algorithm should also scale the data (using the most distant value from the median which is unrobust and numerically dangerous); scaling has been hardwired in the original algorithm and 's mc() till summer 2018, where it became the default. Since robustbase version 0.95-0, March 2022, the default is FALSE. As this may change the result, a message is printed about the new default, once per
session. You can suppress the message by specifying doScale = *
explicitly, or, by setting options(mc_doScale_quiet=TRUE).
c.huberize: a positive number (default: 1e11) used to stabilize the sample via x <- huberize(x, c = c.huberize)
for the mc() computations in the case of a nearly degenerate sample (many observations practically equal to the median) or very extreme outliers. In previous versions of robustbase no such huberization was applied which is equivalent to c.huberize = Inf.
eps1, eps2: tolerance in the algorithm; eps1 is used as a for convergence tolerance, where eps2 is only used in the internal h_kern() function to prevent underflow to zero, so could be considerably smaller. The original code implicitly hard coded in C eps1 := eps2 := 1e-13; only change with care!
maxit: maximal number of iterations; typically a few should be sufficient.
trace.lev: integer specifying how much diagnostic output the algorithm (in C) should produce. No output by default, most output for trace.lev = 5.
full.result: logical indicating if the full return values (from C) should be returned as a list via attr(*, "mcComp").
Returns
a number between -1 and 1, which is the medcouple, MC(x). For r <- mc(x, full.result = TRUE, ....), then attr(r, "mcComp") is a list with components - medc: the medcouple mc.(x).
For extreme cases there were convergence problems which should not happen anymore as we now use doScale=FALSE and huberization (when c.huberize < Inf).
The original algorithm and mc(*, doScale=TRUE) not only centers the data around the median but also scales them by the extremes which may have a negative effect e.g., when changing an extreme outlier to even more extreme, the result changes wrongly; see the 'mc10x' example.
References
Guy Brys, Mia Hubert and Anja Struyf (2004) A Robust Measure of Skewness; JCGS 13 (4), 996--1017.
Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions, Computational Statistics and Data Analysis 52 , 5186--5201.
Lukas Graz (2021). Improvement of the Algorithms for the Medcoule and the Adjusted Outlyingness; unpublished BSc thesis, supervised by M.Maechler, ETH Zurich.
Author(s)
Guy Brys; modifications by Tobias Verbeke and bug fixes and extensions by Manuel Koller and Martin Maechler.
The new default doScale=FALSE, and the new c.huberize were introduced as consequence of Lukas Graz' BSc thesis.
See Also
Qn for a robust measure of scale (aka dispersion ), ....
Examples
mc(1:5)# 0 for a symmetric samplex1 <- c(1,2,7,9,10)mc(x1)# = -1/3data(cushny)mc(cushny)# 0.125stopifnot(mc(c(-20,-5,-2:2,5,20))==0, mc(x1, doReflect=FALSE)==-mc(-x1, doReflect=FALSE), all.equal(mc(x1, doReflect=FALSE),-1/3, tolerance =1e-12))## Susceptibility of the current algorithm to large outliers :dX10 <-function(X) c(1:5,7,10,15,25, X)# generate skewed size-10 with 'X'x <- c(10,20,30,100^(1:20))## (doScale=TRUE, c.huberize=Inf) were (implicit) defaults in earlier {robustbase}:(mc10x <- vapply(x,function(X) mc(dX10(X), doScale=TRUE, c.huberize=Inf),1))## limit X -> Inf should be 7/12 = 0.58333... but that "breaks down a bit" :plot(x, mc10x, type="b", main ="mc( c(1:5,7,10,15,25, X) )", xlab="X", log="x")## The new behavior is much preferable {shows message about new 'doScale=FALSE'}:(mc10N <- vapply(x,function(X) mc(dX10(X)),1))lines(x, mc10N, col=adjustcolor(2,3/4), lwd=3)mtext("mc(*, c.huberize=1e11)", col=2)stopifnot(all.equal(c(4,6, rep(7, length(x)-2))/12, mc10N))## Here, huberization already solves the issue:mc10NS <- vapply(x,function(X) mc(dX10(X), doScale=TRUE),1)stopifnot(all.equal(mc10N, mc10NS))