This function provides a list with in- and out-of-sample indices per fold used for time series k-fold cross-validation, see Details.
create_timefolds(y, k =5L, use_names =TRUE, type = c("extending","moving"))
Arguments
y: Any vector of the same length as the data intended to split.
k: Number of folds.
use_names: Should folds be named? Default is TRUE.
type: Should in-sample data be "extending" over the folds (default) or consist of one single fold ("moving")?
Returns
A nested list with in-sample and out-of-sample indices per fold.
Details
The data is first partitioned into k+1 sequential blocks B1 to Bk+1. Each fold consists of two index vectors: one with in-sample row numbers, the other with out-of-sample row numbers. The first fold uses B1 as in-sample and B2 as out-of-sample data. The second one uses either B2
(if type = "moving") or {B1,B2} (if type = "extending") as in-sample, and B3 as out-of-sample data etc. Finally, the kth fold uses {B1,...,Bk} ("extending") or Bk ("moving") as in-sample data, and Bk+1 as out-of-sample data. This makes sure that out-of-sample data always follows in-sample data.
Examples
y <- runif(100)create_timefolds(y)create_timefolds(y, use_names =FALSE)create_timefolds(y, use_names =FALSE, type ="moving")