Estimate the lead-lag parameters of discretely observed processes by maximizing the shifted Hayashi-Yoshida covariation contrast functions, following Hoffmann et al. (2013).
llag(x, from =-Inf, to =Inf, division =FALSE, verbose =(ci || ccor), grid, psd =TRUE, plot = ci, ccor = ci, ci =FALSE, alpha =0.01, fisher =TRUE, bw, tol =1e-6)
Arguments
x: an object of yuima-class or yuima.data-class.
verbose: logical. If FALSE, llag returns lead-lag time estimates only. The default is FALSE.
from: a numeric vector each of whose component(s) indicates the lower end of a finite grid on which the contrast function is evaluated, if grid is missing.
to: a numeric vector each of whose component(s) indicates the upper end of a finite grid on which the contrast function is evaluated, if grid is missing.
division: a numeric vector each of whose component(s) indicates the number of the points of a finite grid on which the contrast function is evaluated, if grid is missing.
grid: a numeric vector or a list of numeric vectors. See 'Details'.
psd: logical. If TRUE, the estimated cross-correlation functions are converted to the interval [-1,1]. See 'Details'.
plot: logical. If TRUE, the estimated cross-correlation functions are plotted. If ci is also TRUE, the pointwise confidence intervals (under the null hypothesis that the corresponding correlation is zero) are also plotted. The default is FALSE.
ccor: logical. If TRUE, the estimated cross-correlation functions are returned. This argument is ignored if verbose is FALSE. The default is FALSE.
ci: logical. If TRUE, (pointwise) confidence intervals of the estimated cross-correlation functions and p-values for the significance of the correlations at the estimated lead-lag parameters are calculated. Note that the confidence intervals are only plotted when plot=TRUE.
alpha: a posive number indicating the significance level of the confidence intervals for the cross-correlation functions.
fisher: logical. If TRUE, the p-values and the confidence intervals for the cross-correlation functions is evaluated after applying the Fisher z transformation. This argument is only meaningful if pval = "corr".
bw: bandwidth parameter to compute the asymptotic variances. See 'Details' and hyavar for details.
tol: tolelance parameter to avoid numerical errors in comparison of time stamps. All time stamps are divided by tol and rounded to integers. Note that the values of grid are also divided by tol and rounded to integers. A reasonable choice of tol is the minimum unit of time stamps. The default value 1e-6 supposes that the minimum unit of time stamps is greater or equal to 1 micro-second.
Details
Let d be the number of the components of the zoo.data of the object x.
Let Xt0ii,Xt1ii,…,Xtn(i)ii be the observation data of the i-th component (i.e. the i-th component of the zoo.data of the object x).
The shifted Hayashi-Yoshida covariation contrast function Uij(θ) of the observations Xi and Xj(i<j) is defined by the same way as in Hoffmann et al. (2013), which corresponds to their cross-covariance function. The lead-lag parameter θij is defined as a maximizer of ∣Uij(θ)∣. Uij(θ) is evaluated on a finite grid Gij defined below. Thus θij belongs to this grid. If there exist more than two maximizers, the lowest one is selected.
If psd is TRUE, for any i,j the matrix C:=(Ukl(θ))k,l∈i,j is converted to (C%*%C)^(1/2) for ensuring the positive semi-definiteness, and Uij(θ) is redefined as the (1,2)-component of the converted C. Here, Ukk(θ) is set to the realized volatility of Xk. In this case θij is given as a maximizer of the cross-correlation functions.
The grid Gij is defined as follows. First, if grid is missing, Gij is given by
a,a+(b−a)/(N−1),…,a+(N−2)(b−a)/(N−1),b,
where a,b and N are the (d(i−1)−(i−1)i/2+(j−i))-th components of from, to and division respectively. If the corresponding component of from (resp. to) is -Inf (resp. Inf), a=−(tn(j)j−t0i) (resp. b=tn(i)i−t0j) is used, while if the corresponding component of division is FALSE, N=round(2max(n(i),n(j)))+1 is used. Missing components are filled with -Inf (resp. Inf, FALSE). The default value -Inf (resp. Inf, FALSE) means that all components are -Inf (resp. Inf, FALSE). Next, if grid is a numeric vector, Gij is given by grid. If grid is a list of numeric vectors, Gij is given by the (d(i−1)−(i−1)i/2+(j−i))-th component of grid.
The estimated lead-lag parameters are returned as the skew-symmetric matrix (θij)i,j=1,…,d. If verbose is TRUE, the covariance matrix (Uij(θij))i,j=1,…,d corresponding to the estimated lead-lag parameters, the corresponding correlation matrix and the computed contrast functions are also returned. If further ccor is TRUE,the computed cross-correlation functions are returned as a list with the length d(d−1)/2. For i<j, the (d(i−1)−(i−1)i/2+(j−i))-th component of the list consists of an object Uij(θ)/sqrt(Uii(θ)∗Ujj(θ)) of class zoo indexed by Gij.
If plot is TRUE, the computed cross-correlation functions are plotted sequentially.
If ci is TRUE, the asymptotic variances of the cross-correlations are calculated at each point of the grid by using the naive kernel approach descrived in Section 8.2 of Hayashi and Yoshida (2011). The implementation is the same as that of hyavar and more detailed description is found there.
Returns
If verbose is FALSE, a skew-symmetric matrix corresponding to the estimated lead-lag parameters is returned. Otherwise, an object of class "yuima.llag", which is a list with the following components, is returned: - lagcce: a skew-symmetric matrix corresponding to the estimated lead-lag parameters.
covmat: a covariance matrix corresponding to the estimated lead-lag parameters.
cormat: a correlation matrix corresponding to the estimated lead-lag parameters.
LLR: a matrix consisting of lead-lag ratios. See Huth and Abergel (2014) for details.
If ci is TRUE, the following component is added to the returned list: - p.values: a matrix of p-values for the significance of the correlations corresponding to the estimated lead-lag parameters.
If further ccor is TRUE, the following components are added to the returned list: - ccor: a list of computed cross-correlation functions.
avar: a list of computed asymptotic variances of the cross-correlations (if ci = TRUE).
Note
The default grid usually contains too many points, so it is better for users to specify this argument in order to reduce the computational time. See 'Examples' below for an example of the specification.
The evaluated p-values should carefully be interpreted because they are calculated based on pointwise confidence intervals rather than simultaneous confidence intervals (so there would be a multiple testing problem). Evaluation of p-values based on the latter will be implemented in the future extension of this function: Indeed, so far no theory has been developed for this. However, it is conjectured that the error distributions of the estimated cross-correlation functions are asymptotically independent if the grid is not dense too much, so p-values evaluated by this function will still be meaningful as long as sufficiently low significance levels are used.
References
Hayashi, T. and Yoshida, N. (2011) Nonsynchronous covariation process and limit theorems, Stochastic processes and their applications, 121 , 2416--2454.
Hoffmann, M., Rosenbaum, M. and Yoshida, N. (2013) Estimation of the lead-lag parameter from non-synchronous data, Bernoulli, 19 , no. 2, 426--461.
Huth, N. and Abergel, F. (2014) High frequency lead/lag relationships --- Empirical facts, Journal of Empirical Finance, 26 , 41--58.
Author(s)
Yuta Koike with YUIMA Project Team
Examples
## Set a modeldiff.coef.matrix <- matrix(c("sqrt(x1)","3/5*sqrt(x2)","1/3*sqrt(x3)","","4/5*sqrt(x2)","2/3*sqrt(x3)","","","2/3*sqrt(x3)"),3,3)drift <- c("1-x1","2*(10-x2)","3*(4-x3)")cor.mod <- setModel(drift = drift, diffusion = diff.coef.matrix, solve.variable = c("x1","x2","x3"))set.seed(111)## We use a function poisson.random.sampling ## to get observation by Poisson sampling.yuima.samp <- setSampling(Terminal =1, n =1200)yuima <- setYuima(model = cor.mod, sampling = yuima.samp)yuima <- simulate(yuima,xinit=c(1,7,5))## intentionally displace the second time series data2 <- yuima@data@zoo.data[[2]] time2 <- time(data2) theta2 <-0.05# the lag of x2 behind x1 stime2 <- time2 + theta2
time(yuima@data@zoo.data[[2]])<- stime2
data3 <- yuima@data@zoo.data[[3]] time3 <- time(data3) theta3 <-0.12# the lag of x3 behind x1 stime3 <- time3 + theta3
time(yuima@data@zoo.data[[3]])<- stime3
## sampled data by Poisson rulespsample<- poisson.random.sampling(yuima, rate = c(0.2,0.3,0.4), n =1000)## plotplot(psample)## ccecce(psample)## lead-lag estimation (with cross-correlation plots)par(mfcol=c(3,1))result <- llag(psample, plot=TRUE)## estimated lead-lag parameterresult
## computing pointwise confidence intervalsllag(psample, ci =TRUE)## In practice, it is better to specify the grid because the default grid contains too many points.## Here we give an example for how to specify it.## We search lead-lag parameters on the interval [-0.1, 0.1] with step size 0.01 G <- seq(-0.1,0.1,by=0.01)## lead-lag estimation (with computing confidence intervals)result <- llag(psample, grid = G, ci =TRUE)## Since the true lead-lag parameter 0.12 between x1 and x3 is not contained## in the searching grid G, we see that the corresponding cross-correlation ## does not exceed the cofidence interval## detailed output## the p-value for the (1,3)-th component is highresult
## Finally, we can examine confidence intervals of other significant levels## and/or without the Fisher z-transformation via the plot-method defined ## for yuima.llag-class objects as followsplot(result, alpha =0.001)plot(result, fisher =FALSE)par(mfcol=c(1,1))