PCA based on Multivariate MM-estimators with Fast and Robust Bootstrap
PCA based on Multivariate MM-estimators with Fast and Robust Bootstrap
Performs principal components analysis based on the robust MM-estimate of the shape matrix. Additionally uses the Fast and Robust Bootstrap method to compute inference measures such as standard errors and confidence intervals.
## S3 method for class 'formula'FRBpcaMM(formula, data=NULL,...)## Default S3 method:FRBpcaMM(Y, R =999, conf =0.95, control=MMcontrol(...),na.action=na.omit,...)
Arguments
formula: an object of class formula; a symbolic description of the model to be fit.
data: data frame from which variables specified in formula are to be taken.
Y: matrix or data frame.
R: number of bootstrap samples. Default is R=999.
conf: level of the bootstrap confidence intervals. Default is conf=0.95.
control: a list with control parameters for tuning the MM-estimate and its computing algorithm, see MMcontrol().
na.action: a function which indicates what should happen when the data contain NAs. Defaults to na.omit.
...: allows for specifying control parameters directly instead of via control.
Details
Multivariate MM-estimates are defined by first computing an S-estimate of location and covariance, then fixing its scale component and re-estimating the location and the shape by a more efficient M-estimate, see Tatsuoka and Tyler (2000). Tukey's biweight is used for the loss functions. By default, the first loss function (in the S-estimate) is tuned in order to obtain 50% breakdown point. The default tuning of the second loss function (M-estimate) ensures 95% efficiency for the shape matrix estimate at the normal model. The desired efficiency can be changed through argument control. (However, control parameter shapeEff will always be considered as TRUE by this function, whichever value is specified.) The MM-estimates are computed by a call to the implementation in the rrcov package of Todorov and Filzmoser (2009).
The result of this call is also returned as the value est.
PCA is performed by computing the eigenvalues (eigval) and eigenvectors (eigvec) of the MM-estimate of shape, which is a rescaled version of the MM-estimate of covariance (rescaled to have determinant equal to 1). With pvar the function also provides the estimates for the percentage of variance explained by the first k principal components, which are simply the cumulative proportions of the eigenvalues sum. Here, k ranges from 1 to p−1 (with p the number of variables in Y). The eigenvectors are always given in the order of descending eigenvalues.
The Fast and Robust Bootstrap (Salibian-Barrera and Zamar 2002) is used to calculate standard errors, and also so-called basic bootstrap confidence intervals and bias corrected and accelerated (BCa) confidence intervals (Davison and Hinkley 1997, p.194 and p.204 respectively) corresponding to the estimates eigval, eigvec and pvar. The bootstrap is also used to estimate the average angles between true and estimated eigenvectors, returned as avgangle. See Salibian-Barrera, Van Aelst and Willems (2006). The fast and robust bootstrap computations for the MM-estimates are performed by MMboot_loccov() and its raw result can be found in bootest. The actual bootstrap recalculations for the PCA-related quantities can be found in eigval.boot, eigvec.boot and pvar.boot, where each column represents a bootstrap sample. For eigvec.boot, the eigenvectors are stacked on top of each other and the same goes for eigvec.CI.bca and eigvec.CI.basic which hold the confidence limits.
The two columns in the confidence limits always respectively represent the lower and upper limits. For the percentage of variance the function also provides one-sided confidence intervals ([-infty upper]), which can be used to test the hypothesis that the true percentage at least equals a certain value.
Bootstrap samples are discarded if the fast and robust shape estimate is not positive definite, such that the actual number of recalculations used can be lower than R. This actual number equals R - failedsamples. However, if more than 0.75R of the bootstrap shape estimates is non-positive definite, all bootstrap samples will be used anyway, and the negative eigenvalues are simply set to zero (which may impact the confidence limits and standard errors for the smallest eigenvalues in eigval and pvar).
Returns
An object of class FRBpca, which contains the following components: - shape: (p x p) MM-estimate of the shape matrix of Y
eigval: (p x 1) eigenvalues of MM shape
eigvec: (p x p) eigenvectors of MM-shape
pvar: (p-1 x 1) percentages of variance for MM eigenvalues
eigval.boot: (p x R) eigenvalues of MM shape
eigvec.boot: (p*p x R) eigenvectors of MM-shape (vectorized)
pvar.boot: (p-1 x R) percentages of variance for MM eigenvalues
eigval.SE: (p x 1) bootstrap standard error for MM eigenvalues
eigvec.SE: (p x p) bootstrap standard error for MM eigenvectors
pvar.SE: (p-1 x 1) bootstrap standard error for percentage of variance for MM-eigenvalues
angles: (p x R) angles between bootstrap eigenvectors and original MM eigenvectors (in radians; in [0 pi/2])
avgangle: (p x 1) average angles between bootstrap eigenvectors and original MM eigenvectors (in radians; in [0 pi/2])
eigval.CI.bca: (p x 2) BCa intervals for MM eigenvalues
eigvec.CI.bca: (p*p x 2) BCa intervals for MM eigenvectors (vectorized)
pvar.CI.bca: (p-1 x 2) BCa intervals for percentage of variance for MM-eigenvalues
pvar.CIone.bca: (p-1 x 1) one-sided BCa intervals for percentage of variance for MM-eigenvalues ([-infty upper])
eigval.CI.basic: (p x 2) basic bootstrap intervals for MM eigenvalues
eigvec.CI.basic: (p*p x 2) basic bootstrap intervals for MM eigenvectors (vectorized)
pvar.CI.basic: (p-1 x 2) basic bootstrap intervals for percentage of variance for MM-eigenvalues
pvar.CIone.basic: (p-1 x 1) one-sided basic bootstrap intervals for percentage of variance for MM-eigenvalues ([-infty upper])
est: list containing the MM-estimates of location and scatter
bootest: (list) result of MMboot_loccov()
failedsamples: number of bootstrap samples with non-positive definiteness of shape
conf: a copy of the conf argument
method: a character string giving the robust PCA method that was used
w: implicit weights corresponding to the MM-estimates (i.e. final weights in the RWLS procedure)
outFlag: outlier flags: 1 if the robust distance of the observation exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of Y; 0 otherwise
Y: copy of the data argument as a matrix
References
A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2006) PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101 , 1198-1211.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17 , 41-71.
M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30 , 556-582.
K.S. Tatsuoka and D.E. Tyler (2000) The uniqueness of S and M-functionals under non-elliptical distributions. The Annals of Statistics, 28 , 1219-1243
V. Todorov and P. Filzmoser (2009), An Object Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32 (3), 1--47. tools:::Rd_expr_doi("10.18637/jss.v032.i03") .
S. Van Aelst and G. Willems (2013), Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53 (3), 1--32. tools:::Rd_expr_doi("10.18637/jss.v053.i03") .
data(ForgedBankNotes)MMpcares <- FRBpcaMM(ForgedBankNotes, R=999, conf=0.95)# or using the formula interfaceMMpcares <- FRBpcaMM(~.,data=ForgedBankNotes, R=999, conf=0.95)# the simple print method shows the standard deviations with confidence limits:MMpcares
# the summary functions shows a lot more (see help(summary.FRBpca)):summary(MMpcares)# ask for the eigenvalues:MMpcares$eigval
# or, in more pretty format, with confidence limits:summary(MMpcares)$eigvals
# note that the standard deviations of the print-output can also be asked for by:sqrt( summary(MMpcares)$eigvals )# the eigenvectors and their standard errors:MMpcares$eigvec # or prettier: summary(MMpcares)$eigvecsMMpcares$eigvec.SE
# take a look at the bootstrap distribution of the first eigenvalue hist(MMpcares$eigval.boot[1,])# that bootstrap distribution is used to compute confidence limits as depicted # by the screeplot function: plotFRBvars(MMpcares, cumul=0)# all plots for the FRB-PCA result: plot(MMpcares)