Stage I of the Procedure: Locate Outliers (Baseline Function)
Stage I of the Procedure: Locate Outliers (Baseline Function)
This function applies the t-statistics for the significance of outliers at every time point and selects those that are significant given a critical value.
resid: a time series. Residuals from a time series model fitted to the data.
pars: a list containing the parameters of the model fitted to the data. See details below.
cval: a numeric. The critical value to determine the significance of each type of outlier.
types: a character vector indicating the types of outliers to be considered.
delta: a numeric. Parameter of the temporary change type of outlier.
Details
Five types of outliers can be considered. By default: "AO" additive outliers, "LS" level shifts, and "TC" temporary changes are selected; "IO" innovative outliers and "SLS" seasonal level shifts can also be selected.
The approach described in Chen & Liu (1993) is followed to locate outliers. The original framework is based on ARIMA time series models. The extension to structural time series models is currently experimental.
Let us define an ARIMA model for the series yt∗ subject to m outliers defined as Lj(B) with weights w:
yt∗=j=1∑mωjLj(B)It(tj)+ϕ(B)α(B)θ(B)at,
where It(tj) is an indicator variable containing the value 1 at observation tj where the j-th outlier arises; ϕ(B) is an autoregressive polynomial with all roots outside the unit circle; θ(B) is a moving average polynomial with all roots outside the unit circle; and α(B) is an autoregressive polynomial with all roots on the unit circle.
The presence of outliers is tested by means of t-statistics applied on the following regression equation:
π(B)yt∗≡e^t=j=1∑mωjπ(B)Lj(B)It(tj)+at.
where π(B)=∑i=0∞πiBi.
The regressors of the above equation are created by the functions outliers.regressors.arima and the remaining functions described here.
The function locate.outliers computes all the t-statistics for each type of outlier and for every time point. See outliers.tstatistics. Then, the cases where the corresponding t-statistic are (in absolute value) below the threshold cval are removed. Thus, a potential set of outliers is obtained.
Some polishing rules are applied by locate.outliers:
If level shifts are found at consecutive time points, only then point with higher t-statistic in absolute value is kept.
If more than one type of outlier exceed the threshold cval at a given time point, the type of outlier with higher t-statistic in absolute value is kept and the others are removed.
The argument pars is a list containing the parameters of the model. In the framework of ARIMA models, the coefficients of the ARIMA must be defined in pars
as the product of the autoregressive non-seasonal and seasonal polynomials (if any) and the differencing filter (if any). The function coefs2poly can be used to define the argument pars.
Returns
A data frame defining by rows the potential set of outliers. The type of outlier, the observation, the coefficient and the t-statistic are given by columns respectively for each outlier.
Note
The default critical value, cval, is set equal to 3.5 and, hence, it is not based on the sample size as in functions tso
or locate.outliers.oloop.
Currently the innovational outlier "SLS" is not available if pars is related to a structural time series model.
Chen, C. and Liu, Lon-Mu (1993). Joint Estimation of Model Parameters and Outlier Effects in Time Series . Journal of the American Statistical Association, 88 (421), pp. 284-297.