HDSReg() R function from [HDTSA]

Factor analysis with observed regressors for vector time series

HDSReg() considers a multivariate time series model which represents a high-dimensional vector process as a sum of three terms: a linear regression of some observed regressors, a linear combination of some latent and serially correlated factors, and a vector white noise:[REMOVE_ME] ${\bfy}_t = {\bf Dz}_t + {\bf Ax}_t + {\boldsymbol {\epsilon}}_t, [REMOVE_ME_2]$ where c(" ${\\bf\n$ ", " $y}_t$ ") and ${\bf z}_t$ are, respectively, observable $p\times 1$ and $m \times 1$ time series, ${\bf x}_t$ is an $r \times 1$ latent factor process, ${\boldsymbol{\epsilon}}_t$ is a vector white noise process, ${\bf D}$ is an unknown regression coefficient matrix, and ${\bf A}$ is an unknown factor loading matrix. This procedure proposed in Chang, Guo and Yao (2015) aims to estimate the regression coefficient matrix ${\bf D}$ , the number of factors $r$ and the factor loading matrix ${\bf A}$ .


HDSReg(
  Y,
  Z,
  D = NULL,
  lag.k = 5,
  thresh = FALSE,
  delta = 2 * sqrt(log(ncol(Y))/nrow(Y)),
  twostep = FALSE
)

Arguments

Y: An $n \times p$ data matrix ${\bf Y} = ({\bf y}_1, \dots , {\bf y}_n )'$ , where $n$ is the number of the observations of the $p \times 1$ time series $\{{\bf y}_t\}_{t=1}^n$ .
Z: An $n \times m$ data matrix ${\bf Z} = ({\bf z}_1, \dots , {\bf z}_n )'$

consisting of the observed regressors.
D: A $p\times m$ regression coefficient matrix c(" $\\tilde{\\bf\n$ ", " $D}$ "). If D = NULL (the default), our procedure will estimate ${\bf D}$ first and let $\tilde{\bf D}$ be the estimate of ${\bf D}$ . If D is given by the users, then $\tilde{\bf D}={\bf D}$ .
lag.k: The time lag $K$ used to calculate the nonnegative definte matrix $\hat{\mathbf{M}}_{\eta}$ :

\hat{\mathbf{M}}_{\eta}\ =\\sum_{k=1}^{K} T_\delta\{\hat{\mathbf{\Sigma}}_{\eta}(k)\} T_\delta\{\hat{\mathbf{\Sigma}}_{\eta}(k)\}',

where $\hat{\bf \Sigma}_{\eta}(k)$ is the sample autocovariance of ${\boldsymbol {\eta}}_t = {\bf y}_t - \tilde{\bf D}{\bf z}_t$

at lag $k$ and $T_\delta(\cdot)$

is a threshold operator with the threshold level $\delta \geq 0$. See 'Details'. The default is 5.

thresh: Logical. If thresh = FALSE (the default), no thresholding will be applied to estimate $\hat{\mathbf{M}}_{\eta}$ . If thresh = TRUE, $\delta$ will be set through delta. See 'Details'.
delta: The value of the threshold level $\delta$ . The default is $\delta = 2 \sqrt{n^{-1}\log p}$ .
twostep: Logical. The same as the argument twostep in Factors.

Returns

An object of class "factors", which contains the following components:

factor_num: The estimated number of factors $\hat{r}$ .
reg.coff.mat: The estimated $p \times m$ regression coefficient matrix $\tilde{\bf D}$ .
loading.mat: The estimated $p \times \hat{r}$ factor loading matrix ${\bf \hat{A}}$ .
X: The $n\times \hat{r}$ matrix $\hat{\bf X}=(\hat{\bf x}_1,\dots,\hat{\bf x}_n)'$ with $\hat{\mathbf{x}}_t=\hat{\mathbf{A}}'(\mathbf{y}_t-\tilde{\mathbf{D}} \mathbf{z}_t)$ .
lag.k: The time lag used in function.

Description

{\bfy}_t = {\bf Dz}_t + {\bf Ax}_t + {\boldsymbol {\epsilon}}_t,

where c(" ${\\bf\n$ ", " $y}_t$ ") and ${\bf z}_t$ are, respectively, observable $p\times 1$ and $m \times 1$ time series, ${\bf x}_t$ is an $r \times 1$ latent factor process, ${\boldsymbol{\epsilon}}_t$ is a vector white noise process, ${\bf D}$ is an unknown regression coefficient matrix, and ${\bf A}$ is an unknown factor loading matrix. This procedure proposed in Chang, Guo and Yao (2015) aims to estimate the regression coefficient matrix ${\bf D}$ , the number of factors $r$ and the factor loading matrix ${\bf A}$ .

Details

The threshold operator $T_\delta(\cdot)$ is defined as $T_\delta({\bf W}) = \{w_{i,j}1(|w_{i,j}|\geq \delta)\}$ for any matrix ${\bf W}=(w_{i,j})$ , with the threshold level $\delta \geq 0$ and $1(\cdot)$

representing the indicator function. We recommend to choose $\delta=0$ when $p$ is fixed and $\delta>0$ when $p \gg n$ .

Examples


# Example 1 (Example 1 in Chang, Guo and Yao (2015)).
## Generate xt
n <- 400
p <- 200
m <- 2
r <- 3
X <- mat.or.vec(n,r)
x1 <- arima.sim(model = list(ar = c(0.6)), n = n)
x2 <- arima.sim(model = list(ar = c(-0.5)), n = n)
x3 <- arima.sim(model = list(ar = c(0.3)), n = n)
X <- cbind(x1, x2, x3)
X <- t(X)

## Generate yt
Z <- mat.or.vec(m,n)
S1 <- matrix(c(5/8, 1/8, 1/8, 5/8), 2, 2)
Z[,1] <- c(rnorm(m))
for(i in c(2:n)){
  Z[,i] <- S1%*%Z[, i-1] + c(rnorm(m))
}
D <- matrix(runif(p*m, -2, 2), ncol = m)
A <- matrix(runif(p*r, -2, 2), ncol = r)
eps <- mat.or.vec(n, p)
eps <- matrix(rnorm(n*p), p, n)
Y <- D %*% Z + A %*% X + eps
Y <- t(Y)
Z <- t(Z)

## D is known
res1 <- HDSReg(Y, Z, D, lag.k = 2)
## D is unknown
res2 <- HDSReg(Y, Z, lag.k = 2)

References

Chang, J., Guo, B., & Yao, Q. (2015). High dimensional stochastic regression with latent factors, endogeneity and nonlinearity. Journal of Econometrics, 189 , 297--312. tools:::Rd_expr_doi("doi:10.1016/j.jeconom.2015.03.024") .

HDSReg function