llag() R function from [yuima]

Lead Lag Estimator

Estimate the lead-lag parameters of discretely observed processes by maximizing the shifted Hayashi-Yoshida covariation contrast functions, following Hoffmann et al. (2013).


llag(x, from = -Inf, to = Inf, division = FALSE, verbose = (ci || ccor), 
     grid, psd = TRUE, plot = ci, ccor = ci, ci = FALSE, alpha = 0.01, 
     fisher = TRUE, bw, tol = 1e-6)

Arguments

x: an object of yuima-class or yuima.data-class.
verbose: logical. If FALSE, llag returns lead-lag time estimates only. The default is FALSE.
from: a numeric vector each of whose component(s) indicates the lower end of a finite grid on which the contrast function is evaluated, if grid is missing.
to: a numeric vector each of whose component(s) indicates the upper end of a finite grid on which the contrast function is evaluated, if grid is missing.
division: a numeric vector each of whose component(s) indicates the number of the points of a finite grid on which the contrast function is evaluated, if grid is missing.
grid: a numeric vector or a list of numeric vectors. See 'Details'.
psd: logical. If TRUE, the estimated cross-correlation functions are converted to the interval [-1,1]. See 'Details'.
plot: logical. If TRUE, the estimated cross-correlation functions are plotted. If ci is also TRUE, the pointwise confidence intervals (under the null hypothesis that the corresponding correlation is zero) are also plotted. The default is FALSE.
ccor: logical. If TRUE, the estimated cross-correlation functions are returned. This argument is ignored if verbose is FALSE. The default is FALSE.
ci: logical. If TRUE, (pointwise) confidence intervals of the estimated cross-correlation functions and p-values for the significance of the correlations at the estimated lead-lag parameters are calculated. Note that the confidence intervals are only plotted when plot=TRUE.
alpha: a posive number indicating the significance level of the confidence intervals for the cross-correlation functions.
fisher: logical. If TRUE, the p-values and the confidence intervals for the cross-correlation functions is evaluated after applying the Fisher z transformation. This argument is only meaningful if pval = "corr".
bw: bandwidth parameter to compute the asymptotic variances. See 'Details' and hyavar for details.
tol: tolelance parameter to avoid numerical errors in comparison of time stamps. All time stamps are divided by tol and rounded to integers. Note that the values of grid are also divided by tol and rounded to integers. A reasonable choice of tol is the minimum unit of time stamps. The default value 1e-6 supposes that the minimum unit of time stamps is greater or equal to 1 micro-second.

Details

Let $d$ be the number of the components of the zoo.data of the object x.

Let $X^i_{t^i_{0}},X^i_{t^i_{1}},\dots,X^i_{t^i_{n(i)}}$ be the observation data of the $i$ -th component (i.e. the $i$ -th component of the zoo.data of the object x).

The shifted Hayashi-Yoshida covariation contrast function $U_{ij}(\theta)$ of the observations $X^i$ and $X^j$ $(i<j)$ is defined by the same way as in Hoffmann et al. (2013), which corresponds to their cross-covariance function. The lead-lag parameter $\theta_{ij}$ is defined as a maximizer of $|U_{ij}(\theta)|$ . $U_{ij}(\theta)$ is evaluated on a finite grid $G_{ij}$ defined below. Thus $\theta_{ij}$ belongs to this grid. If there exist more than two maximizers, the lowest one is selected.

If psd is TRUE, for any $i,j$ the matrix $C:=(U_{kl}(\theta))_{k,l\in{i,j}}$ is converted to (C%*%C)^(1/2) for ensuring the positive semi-definiteness, and $U_{ij}(\theta)$ is redefined as the $(1,2)$ -component of the converted $C$ . Here, $U_{kk}(\theta)$ is set to the realized volatility of $Xk$ . In this case $\theta_{ij}$ is given as a maximizer of the cross-correlation functions.

The grid $G_{ij}$ is defined as follows. First, if grid is missing, $G_{ij}$ is given by

a, a+(b-a)/(N-1), \dots, a+(N-2)(b-a)/(N-1), b,

where $a,b$ and $N$ are the $(d(i-1)-(i-1)i/2+(j-i))$ -th components of from, to and division respectively. If the corresponding component of from (resp. to) is -Inf (resp. Inf), $a=-(t^j_{n(j)}-t^i_{0})$ (resp. $b=t^i_{n(i)}-t^j_{0}$ ) is used, while if the corresponding component of division is FALSE, $N=round(2max(n(i),n(j)))+1$ is used. Missing components are filled with -Inf (resp. Inf, FALSE). The default value -Inf (resp. Inf, FALSE) means that all components are -Inf (resp. Inf, FALSE). Next, if grid is a numeric vector, $G_{ij}$ is given by grid. If grid is a list of numeric vectors, $G_{ij}$ is given by the $(d(i-1)-(i-1)i/2+(j-i))$ -th component of grid.

The estimated lead-lag parameters are returned as the skew-symmetric matrix $(\theta_{ij})_{i,j=1,\dots,d}$ . If verbose is TRUE, the covariance matrix $(U_{ij}(\theta_{ij}))_{i,j=1,\dots,d}$ corresponding to the estimated lead-lag parameters, the corresponding correlation matrix and the computed contrast functions are also returned. If further ccor is TRUE,the computed cross-correlation functions are returned as a list with the length $d(d-1)/2$ . For $i<j$ , the $(d(i-1)-(i-1)i/2+(j-i))$ -th component of the list consists of an object $U_{ij}(\theta)/sqrt(U_{ii}(\theta)*U_{jj}(\theta))$ of class zoo indexed by $G_{ij}$ .

If plot is TRUE, the computed cross-correlation functions are plotted sequentially.

If ci is TRUE, the asymptotic variances of the cross-correlations are calculated at each point of the grid by using the naive kernel approach descrived in Section 8.2 of Hayashi and Yoshida (2011). The implementation is the same as that of hyavar and more detailed description is found there.

Returns

If verbose is FALSE, a skew-symmetric matrix corresponding to the estimated lead-lag parameters is returned. Otherwise, an object of class "yuima.llag", which is a list with the following components, is returned: - lagcce: a skew-symmetric matrix corresponding to the estimated lead-lag parameters.

covmat: a covariance matrix corresponding to the estimated lead-lag parameters.
cormat: a correlation matrix corresponding to the estimated lead-lag parameters.
LLR: a matrix consisting of lead-lag ratios. See Huth and Abergel (2014) for details.

If ci is TRUE, the following component is added to the returned list: - p.values: a matrix of p-values for the significance of the correlations corresponding to the estimated lead-lag parameters.

If further ccor is TRUE, the following components are added to the returned list: - ccor: a list of computed cross-correlation functions.

avar: a list of computed asymptotic variances of the cross-correlations (if ci = TRUE).

Note

The default grid usually contains too many points, so it is better for users to specify this argument in order to reduce the computational time. See 'Examples' below for an example of the specification.

The evaluated p-values should carefully be interpreted because they are calculated based on pointwise confidence intervals rather than simultaneous confidence intervals (so there would be a multiple testing problem). Evaluation of p-values based on the latter will be implemented in the future extension of this function: Indeed, so far no theory has been developed for this. However, it is conjectured that the error distributions of the estimated cross-correlation functions are asymptotically independent if the grid is not dense too much, so p-values evaluated by this function will still be meaningful as long as sufficiently low significance levels are used.

References

Hayashi, T. and Yoshida, N. (2011) Nonsynchronous covariation process and limit theorems, Stochastic processes and their applications, 121 , 2416--2454.

Hoffmann, M., Rosenbaum, M. and Yoshida, N. (2013) Estimation of the lead-lag parameter from non-synchronous data, Bernoulli, 19 , no. 2, 426--461.

Huth, N. and Abergel, F. (2014) High frequency lead/lag relationships --- Empirical facts, Journal of Empirical Finance, 26 , 41--58.

Author(s)

Yuta Koike with YUIMA Project Team

Examples


## Set a model
diff.coef.matrix <- matrix(c("sqrt(x1)", "3/5*sqrt(x2)",
 "1/3*sqrt(x3)", "", "4/5*sqrt(x2)","2/3*sqrt(x3)",
 "","","2/3*sqrt(x3)"), 3, 3) 
drift <- c("1-x1","2*(10-x2)","3*(4-x3)")
cor.mod <- setModel(drift = drift, 
 diffusion = diff.coef.matrix,
 solve.variable = c("x1", "x2","x3")) 

set.seed(111) 

## We use a function poisson.random.sampling 
## to get observation by Poisson sampling.
yuima.samp <- setSampling(Terminal = 1, n = 1200) 
yuima <- setYuima(model = cor.mod, sampling = yuima.samp) 
yuima <- simulate(yuima,xinit=c(1,7,5)) 

## intentionally displace the second time series

  data2 <- yuima@data@zoo.data[[2]]
  time2 <- time(data2)
  theta2 <- 0.05   # the lag of x2 behind x1
  stime2 <- time2 + theta2  
  time(yuima@data@zoo.data[[2]]) <- stime2

  data3 <- yuima@data@zoo.data[[3]]
  time3 <- time(data3)
  theta3 <- 0.12   # the lag of x3 behind x1
  stime3 <- time3 + theta3 
  time(yuima@data@zoo.data[[3]]) <- stime3

## sampled data by Poisson rules
psample<- poisson.random.sampling(yuima, 
 rate = c(0.2,0.3,0.4), n = 1000) 

## plot
plot(psample)

## cce
cce(psample)

## lead-lag estimation (with cross-correlation plots)
par(mfcol=c(3,1))
result <- llag(psample, plot=TRUE)

## estimated lead-lag parameter
result

## computing pointwise confidence intervals
llag(psample, ci = TRUE)

## In practice, it is better to specify the grid because the default grid contains too many points.
## Here we give an example for how to specify it.

## We search lead-lag parameters on the interval [-0.1, 0.1] with step size 0.01 
G <- seq(-0.1,0.1,by=0.01)

## lead-lag estimation (with computing confidence intervals)
result <- llag(psample, grid = G, ci = TRUE)

## Since the true lead-lag parameter 0.12 between x1 and x3 is not contained
## in the searching grid G, we see that the corresponding cross-correlation 
## does not exceed the cofidence interval

## detailed output
## the p-value for the (1,3)-th component is high
result

## Finally, we can examine confidence intervals of other significant levels
## and/or without the Fisher z-transformation via the plot-method defined 
## for yuima.llag-class objects as follows
plot(result, alpha = 0.001)
plot(result, fisher = FALSE)

par(mfcol=c(1,1))

llag function