foreccomb() R function from [GeomComb]

Format Raw Data for Forecast Combination

Structures cross-sectional input data (individual model forecasts) for forecast combination. Stores data as S3 class foreccomb that serves as input to the forecast combination techniques. Handles missing value imputation (optional) and resolves problems due to perfect collinearity.


foreccomb(observed_vector, prediction_matrix, newobs = NULL,
  newpreds = NULL, byrow = FALSE, na.impute = TRUE, criterion = "RMSE")

Arguments

observed_vector: A vector or univariate time series; contains actual values for training set.
prediction_matrix: A matrix or multivariate time series; contains individual model forecasts for training set.
newobs: A vector or univariate time series; contains actual values if a test set is used (optional).
newpreds: A matrix or multivariate time series; contains individual model forecasts if a test set is used (optional). Does not require specification of newobs -- in the case in which a forecaster only wants to train the forecast combination method with a training set and apply it to future individual model forecasts, only newpreds is required, not newobs.
byrow: logical. The default (FALSE) assumes that each column of the forecast matrices (prediction_matrix and -- if specified -- newpreds) contains forecasts from one forecast model; if each row of the matrices contains forecasts from one forecast model, set to TRUE.
na.impute: logical. The default (TRUE) behavior is to impute missing values via the cross-validated spline approach of the mtsdi package. If set to FALSE, forecasts with missing values will be removed. Missing values in the observed data are never imputed.
criterion: One of "RMSE" (default), "MAE", or "MAPE". Is only used if prediction_matrix is not full rank: The algorithm checks which models are causing perfect collinearity and the one with the worst individual accuracy (according to the chosen criterion) is removed.

Returns

Returns an object of class foreccomb.

Details

The function imports the column names of the prediction matrix (if byrow = FALSE, otherwise the row names) as model names; if no column names are specified, generic model names are created.

The missing value imputation algorithm is a modified version of the EM algorithm for imputation that is applicable to time series data - accounting for correlation between the forecasting models and time structure of the series itself. A smooth spline is fitted to each of the time series at each iteration. The degrees of freedom of each spline are chosen by cross-validation.

Forecast combination relies on the lack of perfect collinearity. The test for this condition checks if prediction_matrix is full rank. In the presence of perfect collinearity, the iterative algorithm identifies the subset of forecasting models that are causing linear dependence and removes the one among them that has the lowest accuracy (according to a selected criterion, default is RMSE). This procedure is repeated until the revised prediction matrix is full rank.

Examples


obs <- rnorm(100)
preds <- matrix(rnorm(1000, 1), 100, 10)
train_o<-obs[1:80]
train_p<-preds[1:80,]
test_o<-obs[81:100]
test_p<-preds[81:100,]

## Example with a training set only:
foreccomb(train_o, train_p)

## Example with a training set and future individual forecasts:
foreccomb(train_o, train_p, newpreds=test_p)

## Example with a training set and a full test set:
foreccomb(train_o, train_p, test_o, test_p)

## Example with forecast models being stored in rows:
preds_row <- matrix(rnorm(1000, 1), 10, 100)
train_p_row <- preds_row[,1:80]
foreccomb(train_o, train_p_row, byrow = TRUE)

## Example with NA imputation:
train_p_na <- train_p
train_p_na[2,3] <- NA
foreccomb(train_o, train_p_na, na.impute = TRUE)

## Example with perfect collinearity:
train_p[,2] <- 0.8*train_p[,1] + 0.4*train_p[,8]
foreccomb(train_o, train_p, criterion="RMSE")

Author(s)

Christoph E. Weiss, Gernot R. Roetzer

References

Junger, W. L., Ponce de Leon, A., and Santos, N. (2003). Missing Data Imputation in Multivariate Time Series via EM Algorithm. Cadernos do IME, 15 , 8--21.

Dempster, A., Laird, N., and Rubin D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1) , 1--38.

foreccomb function