makedata() R function from [rsae]

Synthetic Data Generation for the Basic Unit-Level SAE Model

This function generates synthetic data (possibly contaminated by outliers) for the basic unit-level SAE model.


makedata(seed = 1024, intercept = 1, beta = 1, n = 4, g = 20, areaID = NULL,
         ve = 1, ve.contam = 41, ve.epsilon = 0, vu = 1, vu.contam = 41,
         vu.epsilon = 0)

Arguments

seed: [integer] seed value used in set.seed (default seed = 1024).
intercept: [numeric] or [NULL] value of the intercept of the fixed-effects model or NULL for a model without intercept (default: intercept = 1).
beta: [numeric vector] value of the fixed-effect coefficients (without intercept; default: beta = 1). For each given coefficient, a vector of realizations is drawn from the standard normal distribution.
n: [integer] number of units per area in balanced-data setups (default: n = 4).
g: [integer] number of areas (default: g = 20).
areaID: [integer vector] or [NULL]. If one attempts to generate synthetic unbalanced data, one calls makedata with a vector, the elements of which area identifiers. This vector should contain a series of (integer valued) area IDs. The number of areas is set equal to the number unique IDs.
ve: [numeric] nonnegative value of model/ residual variance.
ve.contam: [numeric] nonnegative value of model variance of the outlier part in a mixture distribution (Tukey-Huber-type contamination model) $e = (1-h)*N(0, ve) + h*N(0, ve.contam)$ .
ve.epsilon: [numeric] value in $[0,1]$ that defines the relative number of outliers (i.e., epsilon or h in the contamination mixture distribution). Typically, it takes values between 0 and 0.5 (but it is not restricted to this interval).
vu: [numeric] value of the (area-level) random-effect variance.
vu.contam: [numeric] nonnegative value of the (area-level) random-effect variance of the outlier part in the contamination mixture distribution.
vu.epsilon: [numeric] value in $[0,1]$ that defines the relative number of outliers in the contamination mixture distribution of the (area-level) random effects.

Details

Let $e[i]$ denote an area-specific $n[i]$ -vector of the response variable for the areas $i = 1,..., g$ . Define a $(n[i] * p)$ -matrix $X[i]$ of realizations from the std. normal distribution, $N(0,1)$ , and let $\beta$ denote a $p$ -vector of regression coefficients. Now, the $y[i]$ are drawn using the law c(" $y[i] ~ N(X[i]\\beta,\n$ ", " $v[e] I[i] + v[u] J[i])$ ") with $v[e]$ and $v[u]$ the variances of the model error and random-effect variance, respectively, and $I[i]$ and $J[i]$ denoting the identity matrix and matrix of ones, respectively.

In addition, we allow the distribution of the model/residual and area-level random effect to be contaminated (cf. Stahel and Welsh, 1997). Notably, the laws of $e[ij]$ and $u[i]$ are replaced by the Tukey-Huber contamination mixture:

c(" $e[i,j] ~ (1 -\n$ ", " $\\epsilon[ve]) N(0,v[e]) + \\epsilon[ve] N(0, v[e,\\epsilon])$ ")
c(" $u[i] ~ (1 -\n$ ", " $\\epsilon[vu]) N(0,v[u]) + \\epsilon[vu] N(0, v[u,\\epsilon])$ ")

where $\epsilon[ve]$ and $\epsilon[vu]$ regulate the degree of contamination; $v[e,\epsilon]$ and $v[u,\epsilon]$

define the variance of the contamination part of the mixture distribution.

Four different contamination setups are possible:

no contamination (i.e., ve.epsilon = vu.epsilon = 0),
contaminated model error (i.e., ve.epsilon != 0 and vu.epsilon = 0),
contaminated random effect (i.e., ve.epsilon = 0 and vu.epsilon != 0),
both are conaminated (i.e., ve.epsilon != 0 and vu.epsilon != 0).

Returns

An instance of the class saemodel.

References

Schoch, T. (2012). Robust Unit-Level Small Area Estimation: A Fast Algorithm for Large Datasets. Austrian Journal of Statistics 41 , 243--265. tools:::Rd_expr_doi("https://doi.org/10.17713/ajs.v41i4.1548")

Stahel, W. A. and A. Welsh (1997). Approaches to robust estimation in the simplest variance components model. Journal of Statistical Planning and Inference 57 , 295--319. tools:::Rd_expr_doi("https://doi.org/10.1016/S0378-3758(96)00050-X")

Examples


# generate a model with synthetic data
model <- makedata()
model

# summary of the model
summary(model)

rsae package Read PDF manual

Maintainer: Tobias Schoch
License: GPL-3
Last published: 2024-02-06

Useful links

makedata function