Simulate a set of time series for modelling in mvgam
Simulate a set of time series for modelling in mvgam
This function simulates sets of time series data for fitting a multivariate GAM that includes shared seasonality and dependence on state-space latent dynamic factors. Random dependencies among series, i.e. correlations in their long-term trends, are included in the form of correlated loadings on the latent dynamic factors
seasonality: character. Either shared, meaning that all series share the exact same seasonal pattern, or hierarchical, meaning that there is a global seasonality but each series' pattern can deviate slightly
use_lv: logical. If TRUE, use dynamic factors to estimate series' latent trends in a reduced dimension format. If FALSE, estimate independent latent trends for each series
n_lv: integer. Number of latent dynamic factors for generating the series' trends. Defaults to 0, meaning that dynamics are estimated independently for each series
trend_model: character specifying the time series dynamics for the latent trend. Options are:
None (no latent trend component; i.e. the GAM component is all that contributes to the linear predictor, and the observation process is the only source of error; similarly to what is estimated by gam)
RW (random walk with possible drift)
AR1 (with possible drift)
AR2 (with possible drift)
AR3 (with possible drift)
VAR1 (contemporaneously uncorrelated VAR1)
VAR1cor (contemporaneously correlated VAR1)
GP (Gaussian Process with squared exponential kernel)
See mvgam_trends for more details
drift: logical, simulate a drift term for each trend
prop_trend: numeric. Relative importance of the trend for each series. Should be between 0 and 1
trend_rel: Deprecated. Use prop_trend instead
freq: integer. The seasonal frequency of the series
family: family specifying the exponential observation family for the series. Currently supported families are: nb(), poisson(), bernoulli(), tweedie(), gaussian(), betar(), lognormal(), student() and Gamma()
phi: vector of dispersion parameters for the series (i.e. size for nb() or phi for betar()). If length(phi) < n_series, the first element of phi will be replicated n_series times. Defaults to 5 for nb() and tweedie(); 10 for betar()
shape: vector of shape parameters for the series (i.e. shape for gamma()) If length(shape) < n_series, the first element of shape will be replicated n_series times. Defaults to 10
sigma: vector of scale parameters for the series (i.e. sd for gaussian() or student(), log(sd) for lognormal()). If length(sigma) < n_series, the first element of sigma will be replicated n_series times. Defaults to 0.5 for gaussian() and student(); 0.2 for lognormal()
nu: vector of degrees of freedom parameters for the series (i.e. nu for student()) If length(nu) < n_series, the first element of nu will be replicated n_series times. Defaults to 3
mu: vector of location parameters for the series. If length(mu) < n_series, the first element of mu will be replicated n_series times. Defaults to small random values between -0.5 and 0.5 on the link scale
prop_missing: numeric stating proportion of observations that are missing. Should be between 0 and 0.8, inclusive
prop_train: numeric stating the proportion of data to use for training. Should be between 0.2 and 1
Returns
A list object containing outputs needed for mvgam, including 'data_train' and 'data_test', as well as some additional information about the simulated seasonality and trend dependencies
Examples
# Simulate series with observations bounded at 0 and 1 (Beta responses)sim_data <- sim_mvgam(family = betar(), trend_model = RW(), prop_trend =0.6)plot_mvgam_series(data = sim_data$data_train, series ='all')# Now simulate series with overdispersed discrete observationssim_data <- sim_mvgam(family = nb(), trend_model = RW(), prop_trend =0.6, phi =10)plot_mvgam_series(data = sim_data$data_train, series ='all')