simulation_model9 function

Convenience function for generating functional data

Convenience function for generating functional data

Periodic functions with outliers of different amplitude. The main model is of the form [REMOVE_ME]Xi(t)=a1isinπ+a2icosπ+ei(t),[REMOVEME2] X_i(t) = a_{1i}\sin \pi + a_{2i}\cos\pi + e_i(t), [REMOVE_ME_2]

with contamination model of the form [REMOVE_ME]Xi(t)=(b1isinπ+b2icosπ)(1ui)+(c1isinπ+c2icosπ)ui+ei(t),[REMOVEME2] X_i(t) = (b_{1i}\sin\pi + b_{2i}\cos\pi)(1-u_i) +(c_{1i}\sin\pi + c_{2i}\cos\pi)u_i + e_i(t), [REMOVE_ME_2]

where t[0,1]t\in [0,1], π[0,2π]\pi \in [0, 2\pi], a1ia_{1i}, a2ia_{2i} follows uniform distribution in an interval [a1,a2][a_1, a_2]

b1ib_{1i}, bi1b_{i1} follows uniform distribution in an interval [b1,b2][b_1, b_2]; c1ic_{1i}, ci1c_{i1} follows uniform distribution in an interval [c1,c2][c_1, c_2]; uiu_i follows Bernoulli distribution and ei(t)e_i(t) is a Gaussian processes with zero mean and covariance function of the form [REMOVE_ME]γ(s,t)=αexpβtsν[REMOVEME2] \gamma(s,t) = \alpha\exp{-\beta|t-s|^\nu} [REMOVE_ME_2]

Please see the simulation models vignette with vignette("simulation_models", package = "fdaoutlier") for more details.

simulation_model9( n = 100, p = 50, outlier_rate = 0.05, kprob = 0.5, ai = c(3, 8), bi = c(1.5, 2.5), ci = c(9, 10.5), cov_alpha = 1, cov_beta = 1, cov_nu = 1, deterministic = TRUE, seed = NULL, plot = F, plot_title = "Simulation Model 9", title_cex = 1.5, show_legend = T, ylabel = "", xlabel = "gridpoints" )

Arguments

  • n: The number of curves to generate. Set to 100100 by default.

  • p: The number of evaluation points of the curves. Curves are usually generated over the interval [0,1][0, 1]. Set to 5050 by default.

  • outlier_rate: A value between [0,1][0, 1] indicating the percentage of outliers. A value of 0.060.06 indicates about 6%6\% of the observations will be outliers depending on whether the parameter deterministic is TRUE or not. Set to 0.050.05 by default.

  • kprob: The probability P(ui=1)P(u_i = 1). Set to 0.50.5 by default.

  • ai: A vector of two values containing a1ia_{1i} and a2ia_{2i}

    in the main model. Set to c(3, 8) by default.

  • bi: A vector of 2 values containing b1ib_{1i} and b2ib_{2i} in the contamination model. Set to c(1.5, 2.5) by default.

  • ci: A vector of 2 values containing c1ic_1i and c2ic_2i in the contamination model. Set to c(9, 10.5) by default.

  • cov_alpha: A value indicating the coefficient of the exponential function of the covariance matrix, i.e., the α\alpha in the covariance function. Set to 11 by default.

  • cov_beta: A value indicating the coefficient of the terms inside the exponential function of the covariance matrix, i.e., the β\beta in the covariance function. Set to 11 by default.

  • cov_nu: A value indicating the power to which to raise the terms inside the exponential function of the covariance matrix, i.e., the ν\nu in the covariance function. Set to 11 by default.

  • deterministic: A logical value. If TRUE, the function will always return round(n*outlier_rate) outliers and consequently the number of outliers is always constant. If FALSE, the number of outliers are determined using n Bernoulli trials with probability outlier_rate, and consequently the number of outliers returned is random. TRUE by default.

  • seed: A seed to set for reproducibility. NULL by default in which case a seed is not set.

  • plot: A logical value indicating whether to plot data.

  • plot_title: Title of plot if plot is TRUE

  • title_cex: Numerical value indicating the size of the plot title relative to the device default. Set to 1.5 by default. Ignored if plot = FALSE.

  • show_legend: A logical indicating whether to add legend to plot if plot = TRUE.

  • ylabel: The label of the y-axis. Set to "" by default.

  • xlabel: The label of the x-axis if plot = TRUE. Set to "gridpoints" by default.

Returns

A list containing: - data: a matrix of size n by p containing the simulated data set

  • true_outliers: a vector of integers indicating the row index of the outliers in the generated data.

Description

Periodic functions with outliers of different amplitude. The main model is of the form

Xi(t)=a1isinπ+a2icosπ+ei(t), X_i(t) = a_{1i}\sin \pi + a_{2i}\cos\pi + e_i(t),

with contamination model of the form

Xi(t)=(b1isinπ+b2icosπ)(1ui)+(c1isinπ+c2icosπ)ui+ei(t), X_i(t) = (b_{1i}\sin\pi + b_{2i}\cos\pi)(1-u_i) +(c_{1i}\sin\pi + c_{2i}\cos\pi)u_i + e_i(t),

where t[0,1]t\in [0,1], π[0,2π]\pi \in [0, 2\pi], a1ia_{1i}, a2ia_{2i} follows uniform distribution in an interval [a1,a2][a_1, a_2]

b1ib_{1i}, bi1b_{i1} follows uniform distribution in an interval [b1,b2][b_1, b_2]; c1ic_{1i}, ci1c_{i1} follows uniform distribution in an interval [c1,c2][c_1, c_2]; uiu_i follows Bernoulli distribution and ei(t)e_i(t) is a Gaussian processes with zero mean and covariance function of the form

γ(s,t)=αexpβtsν \gamma(s,t) = \alpha\exp{-\beta|t-s|^\nu}

Please see the simulation models vignette with vignette("simulation_models", package = "fdaoutlier") for more details.

Examples

dt <- simulation_model9(plot = TRUE) dim(dt$data) dt$true_outliers
  • Maintainer: Oluwasegun Taiwo Ojo
  • License: GPL-3
  • Last published: 2023-09-30