Principal component analysis for vector time series
Principal component analysis for vector time series
PCA_TS() seeks for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries: [REMOVE_ME]\bfyt=Axt,[REMOVEME2] where xt is an unobservable p×1
weakly stationary time series consisting of q(≥1) both contemporaneously and serially uncorrelated subseries. See Chang, Guo and Yao (2018).
PCA_TS( Y, lag.k =5, opt =1, permutation = c("max","fdr"), thresh =FALSE, delta =2* sqrt(log(ncol(Y))/nrow(Y)), prewhiten =TRUE, m =NULL, beta, control = list())
Arguments
Y: An n×p data matrix Y=(y1,…,yn)′, where n is the number of the observations of the p×1
time series {yt}t=1n. The procedure will first normalize yt as V^−1/2yt, where V^ is an estimator for covariance of yt. See details below for the selection of V^−1.
lag.k: The time lag K used to calculate the nonnegative definte matrix W^y:
W^y=Ip+k=1∑KTδ{Σ^y(k)}Tδ{Σ^y(k)}′,
where Σ^y(k) is the sample autocovariance of V^−1/2yt at lag k and Tδ(⋅)
is a threshold operator with the threshold level $\delta \geq 0$. See 'Details'. The default is 5.
opt: An option used to choose which method will be implemented to get a consistent estimate V^ (or V^−1) for the covariance (precision) matrix of yt. If opt = 1, V^ will be defined as the sample covariance matrix. If opt = 2, the precision matrix V^−1 will be calculated by using the function clime() of clime (Cai, Liu and Luo, 2011) with the arguments passed by control.
permutation: The method of permutation procedure to assign the components of z^t to different groups [See Section 2.2.1 in Chang, Guo and Yao (2018)]. Available options include: "max" (the default) for the maximum cross correlation method and "fdr" for the false discovery rate procedure based on multiple tests. See Sections 2.2.2 and 2.2.3 in Chang, Guo and Yao (2018) for more information.
thresh: Logical. If thresh = FALSE (the default), no thresholding will be applied to estimate W^y. If thresh = TRUE, the argument delta is used to specify the threshold level δ.
delta: The value of the threshold level δ. The default is δ=2n−1logp.
prewhiten: Logical. If TRUE (the default), we prewhiten each transformed component series of z^t [See Section 2.2.1 in Chang, Guo and Yao (2018)] by fitting a univariate AR model with the order between 0 and 5 determined by AIC. If FALSE, then the prewhiten procedure will not be performed.
m: A positive integer used in the permutation procedure [See (2.10) in Chang, Guo and Yao (2018)]. The default is 10.
beta: The error rate used in the permutation procedure[See (2.16) in Chang, Guo and Yao (2018)] when permutation = "fdr".
control: A list of control arguments. See ‘Details’.
Returns
An object of class "tspca", which contains the following components: - B: The p×p transformation matrix B^=Γ^y′V^−1/2, where Γ^y is a p×p orthogonal matrix with the columns being the eigenvectors of W^y.
X: The n×p matrix X^=(x^1,…,x^n)′ with x^t=B^yt.
NoGroups: The number of groups.
No_of_Members: The number of members in each group.
Groups: The indices of the components of x^t that belong to each group.
method: A string indicating which permutation procedure is performed.
Description
PCA_TS() seeks for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries:
\bfyt=Axt,
where xt is an unobservable p×1
weakly stationary time series consisting of q(≥1) both contemporaneously and serially uncorrelated subseries. See Chang, Guo and Yao (2018).
Details
The threshold operator Tδ(⋅) is defined as Tδ(W)={wi,j1(∣wi,j∣≥δ)} for any matrix W=(wi,j), with the threshold level δ≥0 and 1(⋅)
representing the indicator function. We recommend to choose δ=0 when p is fixed and δ>0 when p≫n.
For large p, since the sample covariance matrix may not be consistent, we recommend to use the method proposed in Cai, Liu and Luo (2011) to estimate the precision matrix V^−1 (opt = 2).
control is a list of arguments passed to the function clime(), which contains the following components:
nlambda: Number of values for program generated lambda. The default is 100.
lambda.max: Maximum value of program generated lambda. The default is 0.8.
lambda.min: Minimum value of program generated lambda. The default is 10−4 (np) or 10−2 (n\<p).
standardize: Logical. If standardize = TRUE, the variables will be standardized to have mean zero and unit standard deviation. The default is FALSE.
linsolver: An option used to choose which method should be employed. Available options include "primaldual" (the default) and "simplex". Rule of thumb: "primaldual" for large p, "simplex" for small p.
Examples
# Example 1 (Example 1 in the supplementary material of Chang, Guo and Yao (2018)).# p=6, x_t consists of 3 independent subseries with 3, 2 and 1 components.## Generate x_tp <-6;n <-1500X <- mat.or.vec(p,n)x <- arima.sim(model = list(ar = c(0.5,0.3), ma = c(-0.9,0.3,1.2,1.3)),n = n+2, sd =1)for(i in1:3) X[i,]<- x[i:(n+i-1)]x <- arima.sim(model = list(ar = c(0.8,-0.5),ma = c(1,0.8,1.8)), n = n+1, sd =1)for(i in4:5) X[i,]<- x[(i-3):(n+i-4)]x <- arima.sim(model = list(ar = c(-0.7,-0.5), ma = c(-1,-0.8)), n = n, sd =1)X[6,]<- x
## Generate y_tA <- matrix(runif(p*p,-3,3), ncol = p)Y <- A%*%X
Y <- t(Y)## permutation = "max" or permutation = "fdr"res <- PCA_TS(Y, lag.k =5,permutation ="max")res1 <- PCA_TS(Y, lag.k =5,permutation ="fdr", beta =10^(-10))Z <- res$X
# Example 2 (Example 2 in the supplementary material of Chang, Guo and Yao (2018)).# p=20, x_t consists of 5 independent subseries with 6, 5, 4, 3 and 2 components.## Generate x_tp <-20;n <-3000X <- mat.or.vec(p,n)x <- arima.sim(model = list(ar = c(0.5,0.3), ma = c(-0.9,0.3,1.2,1.3)),n.start =500, n = n+5, sd =1)for(i in1:6) X[i,]<- x[i:(n+i-1)]x <- arima.sim(model = list(ar = c(-0.4,0.5), ma = c(1,0.8,1.5,1.8)),n.start =500, n = n+4, sd =1)for(i in7:11) X[i,]<- x[(i-6):(n+i-7)]x <- arima.sim(model = list(ar = c(0.85,-0.3), ma=c(1,0.5,1.2)),n.start =500, n = n+3,sd =1)for(i in12:15) X[i,]<- x[(i-11):(n+i-12)]x <- arima.sim(model = list(ar = c(0.8,-0.5),ma = c(1,0.8,1.8)),n.start =500, n = n+2,sd =1)for(i in16:18) X[i,]<- x[(i-15):(n+i-16)]x <- arima.sim(model = list(ar = c(-0.7,-0.5), ma = c(-1,-0.8)),n.start =500,n = n+1,sd =1)for(i in19:20) X[i,]<- x[(i-18):(n+i-19)]## Generate y_tA <- matrix(runif(p*p,-3,3), ncol =p)Y <- A%*%X
Y <- t(Y)## permutation = "max" or permutation = "fdr"res <- PCA_TS(Y, lag.k =5,permutation ="max")res1 <- PCA_TS(Y, lag.k =5,permutation ="fdr",beta =10^(-200))Z <- res$X
References
Cai, T., Liu, W., & Luo, X. (2011). A constrained L1 minimization approach for sparse precision matrix estimation. Journal of the American Statistical Association, 106 , 594--607. tools:::Rd_expr_doi("doi:10.1198/jasa.2011.tm10155") .
Chang, J., Guo, B., & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. The Annals of Statistics, 46 , 2094--2124. tools:::Rd_expr_doi("doi:10.1214/17-AOS1613") .