PCA_TS function

Principal component analysis for vector time series

Principal component analysis for vector time series

PCA_TS() seeks for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries: [REMOVE_ME]\bfyt=Axt,[REMOVEME2] {\bfy}_t={\bf Ax}_t, [REMOVE_ME_2] where xt{\bf x}_t is an unobservable p×1p \times 1

weakly stationary time series consisting of q (1)q\ (\geq 1) both contemporaneously and serially uncorrelated subseries. See Chang, Guo and Yao (2018).

PCA_TS( Y, lag.k = 5, opt = 1, permutation = c("max", "fdr"), thresh = FALSE, delta = 2 * sqrt(log(ncol(Y))/nrow(Y)), prewhiten = TRUE, m = NULL, beta, control = list() )

Arguments

  • Y: An n×pn \times p data matrix Y=(y1,,yn){\bf Y} = ({\bf y}_1, \dots , {\bf y}_n )', where nn is the number of the observations of the p×1p \times 1

    time series {yt}t=1n\{{\bf y}_t\}_{t=1}^n. The procedure will first normalize yt{\bf y}_t as V^1/2yt\hat{{\bf V}}^{-1/2}{\bf y}_t, where V^\hat{{\bf V}} is an estimator for covariance of yt{\bf y}_t. See details below for the selection of V^1\hat{{\bf V}}^{-1}.

  • lag.k: The time lag KK used to calculate the nonnegative definte matrix W^y\hat{{\bf W}}_y:

W^y = Ip+k=1KTδ{Σ^y(k)}Tδ{Σ^y(k)}, \hat{\mathbf{W}}_y\ =\ \mathbf{I}_p+\sum_{k=1}^{K} T_\delta \{\hat{\mathbf{\Sigma}}_y(k)\} T_\delta \{\hat{\mathbf{\Sigma}}_y(k)\}',

where Σ^y(k)\hat{\bf \Sigma}_y(k) is the sample autocovariance of V^1/2yt \hat{{\bf V}}^{-1/2}{\bf y}_t at lag kk and Tδ()T_\delta(\cdot)

is a threshold operator with the threshold level $\delta \geq 0$. See 'Details'. The default is 5.
  • opt: An option used to choose which method will be implemented to get a consistent estimate V^\hat{\bf V} (or V^1\hat{\bf V}^{-1}) for the covariance (precision) matrix of yt{\bf y}_t. If opt = 1, V^\hat{\bf V} will be defined as the sample covariance matrix. If opt = 2, the precision matrix V^1\hat{\bf V}^{-1} will be calculated by using the function clime() of clime (Cai, Liu and Luo, 2011) with the arguments passed by control.
  • permutation: The method of permutation procedure to assign the components of z^t\hat{\bf z}_t to different groups [See Section 2.2.1 in Chang, Guo and Yao (2018)]. Available options include: "max" (the default) for the maximum cross correlation method and "fdr" for the false discovery rate procedure based on multiple tests. See Sections 2.2.2 and 2.2.3 in Chang, Guo and Yao (2018) for more information.
  • thresh: Logical. If thresh = FALSE (the default), no thresholding will be applied to estimate W^y\hat{\bf W}_y. If thresh = TRUE, the argument delta is used to specify the threshold level δ\delta.
  • delta: The value of the threshold level δ\delta. The default is δ=2n1logp \delta = 2 \sqrt{n^{-1}\log p}.
  • prewhiten: Logical. If TRUE (the default), we prewhiten each transformed component series of z^t\hat{\bf z}_t [See Section 2.2.1 in Chang, Guo and Yao (2018)] by fitting a univariate AR model with the order between 0 and 5 determined by AIC. If FALSE, then the prewhiten procedure will not be performed.
  • m: A positive integer used in the permutation procedure [See (2.10) in Chang, Guo and Yao (2018)]. The default is 10.
  • beta: The error rate used in the permutation procedure[See (2.16) in Chang, Guo and Yao (2018)] when permutation = "fdr".
  • control: A list of control arguments. See ‘Details’.

Returns

An object of class "tspca", which contains the following components: - B: The p×pp\times p transformation matrix B^=Γ^yV^1/2\hat{\bf B}=\hat{\bf \Gamma}_y'\hat{{\bf V}}^{-1/2}, where Γ^y\hat{\bf \Gamma}_y is a p×pp \times p orthogonal matrix with the columns being the eigenvectors of W^y\hat{\bf W}_y.

  • X: The n×pn \times p matrix X^=(x^1,,x^n)\hat{\bf X}=(\hat{\bf x}_1,\dots,\hat{\bf x}_n)' with x^t=B^yt\hat{\bf x}_t = \hat{\bf B}{\bf y}_t.

  • NoGroups: The number of groups.

  • No_of_Members: The number of members in each group.

  • Groups: The indices of the components of x^t\hat{\bf x}_t that belong to each group.

  • method: A string indicating which permutation procedure is performed.

Description

PCA_TS() seeks for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries:

\bfyt=Axt, {\bfy}_t={\bf Ax}_t,

where xt{\bf x}_t is an unobservable p×1p \times 1

weakly stationary time series consisting of q (1)q\ (\geq 1) both contemporaneously and serially uncorrelated subseries. See Chang, Guo and Yao (2018).

Details

The threshold operator Tδ()T_\delta(\cdot) is defined as Tδ(W)={wi,j1(wi,jδ)}T_\delta({\bf W}) = \{w_{i,j}1(|w_{i,j}|\geq \delta)\} for any matrix W=(wi,j){\bf W}=(w_{i,j}), with the threshold level δ0\delta \geq 0 and 1()1(\cdot)

representing the indicator function. We recommend to choose δ=0\delta=0 when pp is fixed and δ>0\delta>0 when pnp \gg n.

For large pp, since the sample covariance matrix may not be consistent, we recommend to use the method proposed in Cai, Liu and Luo (2011) to estimate the precision matrix V^1\hat{{\bf V}}^{-1} (opt = 2).

control is a list of arguments passed to the function clime(), which contains the following components:

  • nlambda: Number of values for program generated lambda. The default is 100.
  • lambda.max: Maximum value of program generated lambda. The default is 0.8.
  • lambda.min: Minimum value of program generated lambda. The default is 10410^{-4} (npn\>p) or 10210^{-2} (n\<pn\<p).
  • standardize: Logical. If standardize = TRUE, the variables will be standardized to have mean zero and unit standard deviation. The default is FALSE.
  • linsolver: An option used to choose which method should be employed. Available options include "primaldual" (the default) and "simplex". Rule of thumb: "primaldual" for large pp, "simplex" for small pp.

Examples

# Example 1 (Example 1 in the supplementary material of Chang, Guo and Yao (2018)). # p=6, x_t consists of 3 independent subseries with 3, 2 and 1 components. ## Generate x_t p <- 6;n <- 1500 X <- mat.or.vec(p,n) x <- arima.sim(model = list(ar = c(0.5, 0.3), ma = c(-0.9, 0.3, 1.2,1.3)), n = n+2, sd = 1) for(i in 1:3) X[i,] <- x[i:(n+i-1)] x <- arima.sim(model = list(ar = c(0.8,-0.5),ma = c(1,0.8,1.8) ), n = n+1, sd = 1) for(i in 4:5) X[i,] <- x[(i-3):(n+i-4)] x <- arima.sim(model = list(ar = c(-0.7, -0.5), ma = c(-1, -0.8)), n = n, sd = 1) X[6,] <- x ## Generate y_t A <- matrix(runif(p*p, -3, 3), ncol = p) Y <- A%*%X Y <- t(Y) ## permutation = "max" or permutation = "fdr" res <- PCA_TS(Y, lag.k = 5,permutation = "max") res1 <- PCA_TS(Y, lag.k = 5,permutation = "fdr", beta = 10^(-10)) Z <- res$X # Example 2 (Example 2 in the supplementary material of Chang, Guo and Yao (2018)). # p=20, x_t consists of 5 independent subseries with 6, 5, 4, 3 and 2 components. ## Generate x_t p <- 20;n <- 3000 X <- mat.or.vec(p,n) x <- arima.sim(model = list(ar = c(0.5, 0.3), ma = c(-0.9, 0.3, 1.2, 1.3)), n.start = 500, n = n+5, sd = 1) for(i in 1:6) X[i,] <- x[i:(n+i-1)] x <- arima.sim(model = list(ar = c(-0.4, 0.5), ma = c(1, 0.8, 1.5, 1.8)), n.start = 500, n = n+4, sd = 1) for(i in 7:11) X[i,] <- x[(i-6):(n+i-7)] x <- arima.sim(model = list(ar = c(0.85,-0.3), ma=c(1, 0.5, 1.2)), n.start = 500, n = n+3,sd = 1) for(i in 12:15) X[i,] <- x[(i-11):(n+i-12)] x <- arima.sim(model = list(ar = c(0.8, -0.5),ma = c(1, 0.8, 1.8)), n.start = 500, n = n+2,sd = 1) for(i in 16:18) X[i,] <- x[(i-15):(n+i-16)] x <- arima.sim(model = list(ar = c(-0.7, -0.5), ma = c(-1, -0.8)), n.start = 500,n = n+1,sd = 1) for(i in 19:20) X[i,] <- x[(i-18):(n+i-19)] ## Generate y_t A <- matrix(runif(p*p, -3, 3), ncol =p) Y <- A%*%X Y <- t(Y) ## permutation = "max" or permutation = "fdr" res <- PCA_TS(Y, lag.k = 5,permutation = "max") res1 <- PCA_TS(Y, lag.k = 5,permutation = "fdr",beta = 10^(-200)) Z <- res$X

References

Cai, T., Liu, W., & Luo, X. (2011). A constrained L1 minimization approach for sparse precision matrix estimation. Journal of the American Statistical Association, 106 , 594--607. tools:::Rd_expr_doi("doi:10.1198/jasa.2011.tm10155") .

Chang, J., Guo, B., & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. The Annals of Statistics, 46 , 2094--2124. tools:::Rd_expr_doi("doi:10.1214/17-AOS1613") .

  • Maintainer: Chen Lin
  • License: GPL-3
  • Last published: 2025-01-28