Time series resampling interpolates new values for time steps not available in the original time series. This operation is useful to:
Transform irregular time series into regular.
Align time series with different temporal resolutions.
Increase (upsampling) or decrease (downsampling) the temporal resolution of a time series.
Time series resampling should not be used to extrapolate new values outside of the original time range of the time series, or to increase the resolution of a time series by a factor of two or more. These operations are known to produce non-sensical results.
Warning : This function resamples time series lists with overlapping times . Please check such overlap by assessing the columns "begin" and "end " of the data frame resulting from df <- tsl_time(tsl = tsl). Resampling will be limited by the shortest time series in your time series list. To resample non-overlapping time series, please subset the individual components of tsl one by one either using tsl_subset() or the syntax tsl = my_tsl[[i]].
Methods
This function offers three methods for time series interpolation:
"linear" (default): interpolation via piecewise linear regression as implemented in zoo::na.approx().
"spline": cubic smoothing spline regression as implemented in stats::smooth.spline().
"loess": local polynomial regression fitting as implemented in stats::loess().
These methods are used to fit models y ~ x where y represents the values of a univariate time series and x represents a numeric version of its time.
The functions utils_optimize_spline() and utils_optimize_loess() are used under the hood to optimize the complexity of the methods "spline" and "loess" by finding the configuration that minimizes the root mean squared error (RMSE) between observed and predicted y. However, when the argument max_complexity = TRUE, the complexity optimization is ignored, and a maximum complexity model is used instead.
New time
The argument new_time offers several alternatives to help define the new time of the resulting time series:
NULL: the target time series (x) is resampled to a regular time within its original time range and number of observations.
zoo object: a zoo object to be used as template for resampling. Useful when the objective is equalizing the frequency of two separate zoo objects.
time series list: a time series list to be used as template. The range of overlapping dates and the average resolution are used to generate the new resampling time. This method cannot be used to align two time series lists, unless the template is resampled beforehand.
time vector: a time vector of a class compatible with the time in x.
keyword: character string defining a resampling keyword, obtained via zoo_time(x, keywords = "resample")$keywords..
numeric: a single number representing the desired interval between consecutive samples in the units of x (relevant units can be obtained via zoo_time(x)$units).
Step by Step
The steps to resample a time series list are:
The time interpolation range is computed from the intersection of all times in tsl. This step ensures that no extrapolation occurs during resampling, but it also makes resampling of non-overlapping time series impossible.
If new_time is provided, any values of new_time outside of the minimum and maximum interpolation times are removed to avoid extrapolation. If new_time is not provided, a regular time within the interpolation time range with the length of the shortest time series in tsl is generated.
For each univariate time time series, a model y ~ x, where y is the time series and x is its own time coerced to numeric is fitted.
If max_complexity == FALSE, the model with the complexity that minimizes the root mean squared error between the observed and predicted y is returned.
If max_complexity == TRUE and method = "spline" or method = "loess", the first valid model closest to a maximum complexity is returned.
The fitted model is predicted over new_time to generate the resampled time series.
Other Details
Please use this operation with care, as there are limits to the amount of resampling that can be done without distorting the data. The safest option is to keep the distance between new time points within the same magnitude of the distance between the old time points.
This function supports a parallelization setup via future::plan(), and progress bars provided by the package progressr.
tsl: (required, list) Time series list. Default: NULL
new_time: (required, zoo object, time series list, character string, time vector, numeric) New time to resample to. If a time vector is provided, it must be of a class compatible with the time of tsl. If a zoo object or time series list is provided, its time is used as a template to resample tsl. Valid resampling keywords (see tsl_time()) are allowed. Numeric values are interpreted as interval widths in the time units of the time series. If NULL, irregular time series are predicted into a regular version of their own time. Default: NULL
method: (optional, character string) Name of the method to resample the time series. One of "linear", "spline" or "loess". Default: "linear".
max_complexity: (required, logical). Only relevant for methods "spline" and "loess". If TRUE, model optimization is ignored, and the a model of maximum complexity (an overfitted model) is used for resampling. Default: FALSE
Returns
time series list
Description
Objective
Time series resampling interpolates new values for time steps not available in the original time series. This operation is useful to:
Transform irregular time series into regular.
Align time series with different temporal resolutions.
Increase (upsampling) or decrease (downsampling) the temporal resolution of a time series.
Time series resampling should not be used to extrapolate new values outside of the original time range of the time series, or to increase the resolution of a time series by a factor of two or more. These operations are known to produce non-sensical results.
Warning : This function resamples time series lists with overlapping times . Please check such overlap by assessing the columns "begin" and "end " of the data frame resulting from df <- tsl_time(tsl = tsl). Resampling will be limited by the shortest time series in your time series list. To resample non-overlapping time series, please subset the individual components of tsl one by one either using tsl_subset() or the syntax tsl = my_tsl[[i]].
Methods
This function offers three methods for time series interpolation:
"linear" (default): interpolation via piecewise linear regression as implemented in zoo::na.approx().
"spline": cubic smoothing spline regression as implemented in stats::smooth.spline().
"loess": local polynomial regression fitting as implemented in stats::loess().
These methods are used to fit models y ~ x where y represents the values of a univariate time series and x represents a numeric version of its time.
The functions utils_optimize_spline() and utils_optimize_loess() are used under the hood to optimize the complexity of the methods "spline" and "loess" by finding the configuration that minimizes the root mean squared error (RMSE) between observed and predicted y. However, when the argument max_complexity = TRUE, the complexity optimization is ignored, and a maximum complexity model is used instead.
New time
The argument new_time offers several alternatives to help define the new time of the resulting time series:
NULL: the target time series (x) is resampled to a regular time within its original time range and number of observations.
zoo object: a zoo object to be used as template for resampling. Useful when the objective is equalizing the frequency of two separate zoo objects.
time series list: a time series list to be used as template. The range of overlapping dates and the average resolution are used to generate the new resampling time. This method cannot be used to align two time series lists, unless the template is resampled beforehand.
time vector: a time vector of a class compatible with the time in x.
keyword: character string defining a resampling keyword, obtained via zoo_time(x, keywords = "resample")$keywords..
numeric: a single number representing the desired interval between consecutive samples in the units of x (relevant units can be obtained via zoo_time(x)$units).
Step by Step
The steps to resample a time series list are:
The time interpolation range is computed from the intersection of all times in tsl. This step ensures that no extrapolation occurs during resampling, but it also makes resampling of non-overlapping time series impossible.
If new_time is provided, any values of new_time outside of the minimum and maximum interpolation times are removed to avoid extrapolation. If new_time is not provided, a regular time within the interpolation time range with the length of the shortest time series in tsl is generated.
For each univariate time time series, a model y ~ x, where y is the time series and x is its own time coerced to numeric is fitted.
If max_complexity == FALSE, the model with the complexity that minimizes the root mean squared error between the observed and predicted y is returned.
If max_complexity == TRUE and method = "spline" or method = "loess", the first valid model closest to a maximum complexity is returned.
The fitted model is predicted over new_time to generate the resampled time series.
Other Details
Please use this operation with care, as there are limits to the amount of resampling that can be done without distorting the data. The safest option is to keep the distance between new time points within the same magnitude of the distance between the old time points.
This function supports a parallelization setup via future::plan(), and progress bars provided by the package progressr.
Examples
#generate irregular time seriestsl <- tsl_simulate( n =2, rows =100, irregular =TRUE)if(interactive()){ tsl_plot(tsl)}#range of times between samplestsl_time_summary(tsl)[ c("units","resolution_min","resolution_max")]#resample to regular using linear interpolationtsl_regular <- tsl_resample( tsl = tsl
)if(interactive()){ tsl_plot(tsl_regular)}#check new resolutiontsl_time_summary(tsl_regular)[ c("units","resolution_min","resolution_max")]#resample using keywords#valid resampling keywordstsl_time_summary( tsl = tsl, keywords ="resample")$keywords
#by monthtsl_months <- tsl_resample( tsl = tsl, new_time ="months")if(interactive()){ tsl_plot(tsl_months)}#by weektsl_weeks <- tsl_resample( tsl = tsl, new_time ="weeks")if(interactive()){ tsl_plot(tsl_weeks)}#resample using time interval#get relevant unitstsl_time(tsl)$units
#resampling to 15 days intervalstsl_15_days <- tsl_resample( tsl = tsl, new_time =15#days)tsl_time_summary(tsl_15_days)[ c("units","resolution_min","resolution_max")]if(interactive()){ tsl_plot(tsl_15_days)}#aligning two time series listsç#two time series lists with different time rangestsl1 <- tsl_simulate( n =2, rows =80, time_range = c("2010-01-01","2020-01-01"), irregular =TRUE)tsl2 <- tsl_simulate( n =2, rows =120, time_range = c("2005-01-01","2024-01-01"), irregular =TRUE)#check time featurestsl_time_summary(tsl1)[ c("begin","end","resolution_min","resolution_max")]tsl_time_summary(tsl2)[ c("begin","end","resolution_min","resolution_max")]#tsl1 to regulartsl1_regular <- tsl_resample( tsl = tsl1
)#tsl2 resampled to time of tsl1_regulartsl2_regular <- tsl_resample( tsl = tsl2, new_time = tsl1_regular
)#check alignmenttsl_time_summary(tsl1_regular)[ c("begin","end","resolution_min","resolution_max")]tsl_time_summary(tsl2_regular)[ c("begin","end","resolution_min","resolution_max")]
See Also
zoo_resample()
Other tsl_processing: tsl_aggregate(), tsl_smooth(), tsl_stats(), tsl_transform()