TRC function

Transformed rank correlations for multivariate outlier detection

Transformed rank correlations for multivariate outlier detection

TRC starts from bivariate Spearman correlations and obtains a positive definite covariance matrix by back-transforming robust univariate medians and mads of the eigenspace. TRC can cope with missing values by a regression imputation using the a robust regression on the best predictor and it takes sampling weights into account.

TRC( data, weights, overlap = 3, mincor = 0, robust.regression = "rank", gamma = 0.5, prob.quantile = 0.75, alpha = 0.05, md.type = "m", monitor = FALSE )

Arguments

  • data: a data frame or matrix with the data.
  • weights: sampling weights.
  • overlap: minimum number of jointly observed values for calculating the rank correlation.
  • mincor: minimal absolute correlation to impute.
  • robust.regression: type of regression: "irls" is iteratively reweighted least squares M-estimator, "rank" is based on the rank correlations.
  • gamma: minimal number of jointly observed values to impute.
  • prob.quantile: if mads are 0, try this quantile of absolute deviations.
  • alpha: (1 - alpha) Quantile of F-distribution is used for cut-off.
  • md.type: type of Mahalanobis distance when missing values occur: "m" marginal (default), "c" conditional.
  • monitor: if TRUE, verbose output.

Returns

TRC returns a list whose first component output is a sublist with the following components:

  • sample.size: Number of observations
  • number.of.variables: Number of variables
  • number.of.missing.items: Number of missing values
  • significance.level: 1 - alpha
  • computation.time: Elapsed computation time
  • medians: Componentwise medians
  • mads: Componentwise mads
  • center: Location estimate
  • scatter: Covariance estimate
  • robust.regression: Input parameter
  • md.type: Input parameter
  • cutpoint: The default threshold MD-value for the cut-off of outliers

The further components returned by TRC are:

  • outind: Indicator of outliers
  • dist: Mahalanobis distances (with missing values)

Details

TRC is similar to a one-step OGK estimator where the starting covariances are obtained from rank correlations and an ad hoc missing value imputation plus weighting is provided.

Examples

data(bushfirem, bushfire.weights) det.res <- TRC(bushfirem, weights = bushfire.weights) PlotMD(det.res$dist, ncol(bushfirem)) print(det.res)

References

Béguin, C. and Hulliger, B. (2004) Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations, JRSS-A, 167, Part 2, pp. 275-294.

Author(s)

Beat Hulliger

  • Maintainer: Beat Hulliger
  • License: MIT + file LICENSE
  • Last published: 2023-03-14