covOGK function

Orthogonalized Gnanadesikan-Kettenring (OGK) Covariance Matrix Estimation

Orthogonalized Gnanadesikan-Kettenring (OGK) Covariance Matrix Estimation

Computes the orthogonalized pairwise covariance matrix estimate described in in Maronna and Zamar (2002). The pairwise proposal goes back to Gnanadesikan and Kettenring (1972).

covOGK(X, n.iter = 2, sigmamu, rcov = covGK, weight.fn = hard.rejection, keep.data = FALSE, ...) covGK (x, y, scalefn = scaleTau2, ...) s_mad(x, mu.too = FALSE, na.rm = FALSE) s_IQR(x, mu.too = FALSE, na.rm = FALSE)

Arguments

  • X: data in something that can be coerced into a numeric matrix.

  • n.iter: number of orthogonalization iterations. Usually 1 or 2; values greater than 2 are unlikely to have any significant effect on the estimate (other than increasing the computing time).

  • sigmamu, scalefn: a function that computes univariate robust location and scale estimates. By default it should return a single numeric value containing the robust scale (standard deviation) estimate. When mu.too is true, sigmamu() should return a numeric vector of length 2 containing robust location and scale estimates. See scaleTau2, s_Qn, s_Sn, s_mad or s_IQR for examples to be used as sigmamu argument.

  • rcov: function that computes a robust covariance estimate between two vectors. The default, Gnanadesikan-Kettenring's covGK, is simply (s2(X+Y)s2(XY))/4(s^2(X+Y) - s^2(X-Y))/4 where s()s() is the scale estimate sigmamu().

  • weight.fn: a function of the robust distances and the number of variables pp to compute the weights used in the reweighting step.

  • keep.data: logical indicating if the (untransformed) data matrix X should be kept as part of the result.

  • ...: additional arguments; for covOGK to be passed to sigmamu() and weight.fn(); for covGK passed to scalefn.

  • x,y: numeric vectors of the same length, the covariance of which is sought in covGK (or the scale, in s_mad or s_IQR).

  • mu.too: logical indicating if both location and scale should be returned or just the scale (when mu.too=FALSE as by default).

  • na.rm: if TRUE then NA values are stripped from x before computation takes place.

Details

Typical default values for the function arguments sigmamu, rcov, and weight.fn, are available as well, see the Examples below, but their names and calling sequences are still subject to discussion and may be changed in the future.

The current default, weight.fn = hard.rejection corresponds to the proposition in the litterature, but Martin Maechler strongly believes that the hard threshold currently in use is too arbitrary, and further that soft thresholding should be used instead, anyway.

Returns

covOGK() currently returns a list with components - center: robust location: numeric vector of length pp.

  • cov: robust covariance matrix estimate: pxpp x p

    matrix.

  • wcenter, wcov: re-w eighted versions of center and cov.

  • weights: the robustness weights used.

  • distances: the mahalanobis distances computed using center and cov.

...``...

but note that this might be radically changed to returning an S4classed object!

covGK() is a trivial 1-line function returning the covariance estimate

c^(x,y)=(σ^(x+y)2σ^(xy)2)/4, \hat c(x,y) = \left(\hat \sigma(x+y)^2 - \hat \sigma(x-y)^2 \right)/4,%c^(x,y) = [s^(x+y)^2 - s^(x-y)^2]/4,

where s(u)s^(u) is the scale estimate of uu

specified by scalefn.

s_mad(), and s_IQR() return the scale estimates mad or IQR

respectively, where the s_* functions return a length-2 vector (mu, sig) when mu.too = TRUE, see also scaleTau2.

References

Maronna, R.A. and Zamar, R.H. (2002) Robust estimates of location and dispersion of high-dimensional datasets; Technometrics 44 (4), 307--317.

Gnanadesikan, R. and John R. Kettenring (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28 , 81--124.

Author(s)

Kjell Konis konis@stats.ox.ac.uk , with modifications by Martin Maechler.

See Also

scaleTau2, covMcd, cov.rob.

Examples

data(hbk) hbk.x <- data.matrix(hbk[, 1:3]) cO1 <- covOGK(hbk.x, sigmamu = scaleTau2) cO2 <- covOGK(hbk.x, sigmamu = s_Qn) cO3 <- covOGK(hbk.x, sigmamu = s_Sn) cO4 <- covOGK(hbk.x, sigmamu = s_mad) cO5 <- covOGK(hbk.x, sigmamu = s_IQR) data(toxicity) cO1tox <- covOGK(toxicity, sigmamu = scaleTau2) cO2tox <- covOGK(toxicity, sigmamu = s_Qn) ## nice formatting of correlation matrices: as.dist(round(cov2cor(cO1tox$cov), 2)) as.dist(round(cov2cor(cO2tox$cov), 2)) ## "graphical" symnum(cov2cor(cO1tox$cov)) symnum(cov2cor(cO2tox$cov), legend=FALSE)