Convenience function for generating functional data
Convenience function for generating functional data
This models generates shape outliers that have a different shape for a portion of the domain. The main model is of the form: [REMOVE_ME]Xi(t)=μt+ei(t),[REMOVEME2] with contamination model of the form: [REMOVE_ME]Xi(t)=μt+(−1)uq+(−1)(1−u)(rπ1)exp(−z(t−v)w)+ei(t)[REMOVEME2]
where: t∈[0,1], ei(t) is a Gaussian process with zero mean and covariance function of the form: [REMOVE_ME]γ(s,t)=αexp(−β∣t−s∣ν),[REMOVEME2]
u follows Bernoulli distribution with probability P(u=1)=0.5; q, r, z and w are constants, and v follows a Uniform distribution between an interval [a,b] and m is a constant. Please see the simulation models vignette with vignette("simulation_models", package = "fdaoutlier") for more details.
simulation_model6( n =100, p =50, outlier_rate =0.1, mu =4, q =1.8, kprob =0.5, a =0.25, b =0.75, cov_alpha =1, cov_beta =1, cov_nu =1, pi_coeff =0.02, exp_pow =2, exp_coeff =50, deterministic =TRUE, seed =NULL, plot = F, plot_title ="Simulation Model 6", title_cex =1.5, show_legend = T, ylabel ="", xlabel ="gridpoints")
Arguments
n: The number of curves to generate. Set to 100 by default.
p: The number of evaluation points of the curves. Curves are usually generated over the interval [0,1]. Set to 50 by default.
outlier_rate: A value between [0,1] indicating the percentage of outliers. A value of 0.06 indicates about 6% of the observations will be outliers depending on whether the parameter deterministic is TRUE or not. Set to 0.05 by default.
mu: The mean value of the functions in the main and contamination model. Set to 4 by default.
q: The constant term q in the contamination model. Set to 1.8
by default.
kprob: The probability P(u=1). Set to 0.5 by default.
a, b: Values specifying the interval of from which v in the contamination model is drawn. Set to 0.25 and 0.75 respectively.
cov_alpha: A value indicating the coefficient of the exponential function of the covariance matrix, i.e., the α in the covariance function. Set to 1 by default.
cov_beta: A value indicating the coefficient of the terms inside the exponential function of the covariance matrix, i.e., the β in the covariance function. Set to 1 by default.
cov_nu: A value indicating the power to which to raise the terms inside the exponential function of the covariance matrix, i.e., the ν in the covariance function. Set to 1 by default.
pi_coeff: The constant r in the contamination model i.e., the coefficient of of pi. Set to 0.02 by default.
exp_pow: The constant w in the contamination model i.e., the power of the term in the exponential function of the contamination model. Set to 2.
exp_coeff: The constant z in the contamination model i.e., the coefficient term in the exponential function of the contamination model. Set to 50 by default.
deterministic: A logical value. If TRUE, the function will always return round(n*outlier_rate) outliers and consequently the number of outliers is always constant. If FALSE, the number of outliers are determined using n Bernoulli trials with probability outlier_rate, and consequently the number of outliers returned is random. TRUE by default.
seed: A seed to set for reproducibility. NULL by default in which case a seed is not set.
plot: A logical value indicating whether to plot data.
plot_title: Title of plot if plot is TRUE
title_cex: Numerical value indicating the size of the plot title relative to the device default. Set to 1.5 by default. Ignored if plot = FALSE.
show_legend: A logical indicating whether to add legend to plot if plot = TRUE.
ylabel: The label of the y-axis. Set to "" by default.
xlabel: The label of the x-axis if plot = TRUE. Set to "gridpoints" by default.
Returns
A list containing: - data: a matrix of size n by p containing the simulated data set
true_outliers: a vector of integers indicating the row index of the outliers in the generated data.
Description
This models generates shape outliers that have a different shape for a portion of the domain. The main model is of the form:
where: t∈[0,1], ei(t) is a Gaussian process with zero mean and covariance function of the form:
γ(s,t)=αexp(−β∣t−s∣ν),
u follows Bernoulli distribution with probability P(u=1)=0.5; q, r, z and w are constants, and v follows a Uniform distribution between an interval [a,b] and m is a constant. Please see the simulation models vignette with vignette("simulation_models", package = "fdaoutlier") for more details.