Creates replicate factors for the generalized survey bootstrap
Creates replicate factors for the generalized survey bootstrap
Creates replicate factors for the generalized survey bootstrap method. The generalized survey bootstrap is a method for forming bootstrap replicate weights from a textbook variance estimator, provided that the variance estimator can be represented as a quadratic form whose matrix is positive semidefinite (this covers a large class of variance estimators).
make_gen_boot_factors(Sigma, num_replicates, tau =1, exact_vcov =FALSE)
Arguments
Sigma: The matrix of the quadratic form used to represent the variance estimator. Must be positive semidefinite.
num_replicates: The number of bootstrap replicates to create.
tau: Either "auto", or a single number; the default value is 1. This is the rescaling constant used to avoid negative weights through the transformation τw+τ−1, where w is the original weight and τ is the rescaling constant tau.
If tau="auto", the rescaling factor is determined automatically as follows: if all of the adjustment factors are nonnegative, then tau is set equal to 1; otherwise, tau is set to the smallest value needed to rescale the adjustment factors such that they are all at least 0.01. Instead of using tau="auto", the user can instead use the function rescale_reps() to rescale the replicates later.
exact_vcov: If exact_vcov=TRUE, the replicate factors will be generated such that their variance-covariance matrix exactly matches the target variance estimator's quadratic form (within numeric precision). This is desirable as it causes variance estimates for totals to closely match the values from the target variance estimator. This requires that num_replicates exceeds the rank of Sigma. The replicate factors are generated by applying PCA-whitening to a collection of draws from a multivariate Normal distribution, then applying a coloring transformation to the whitened collection of draws.
Returns
A matrix with the same number of rows as Sigma, and the number of columns equal to num_replicates. The object has an attribute named tau which can be retrieved by calling attr(which = 'tau') on the object. The value tau is a rescaling factor which was used to avoid negative weights.
In addition, the object has attributes named scale and rscales which can be passed directly to svrepdesign . Note that the value of scale is τ2/B, while the value of rscales is vector of length B, with every entry equal to 1.
Statistical Details
Let v(Ty^) be the textbook variance estimator for an estimated population total T^y of some variable y. The base weight for case i in our sample is wi, and we let y˘i denote the weighted value wiyi. Suppose we can represent our textbook variance estimator as a quadratic form: v(T^y)=y˘Σy˘T, for some n×n matrix Σ. The only constraint on Σ is that, for our sample, it must be symmetric and positive semidefinite.
The bootstrapping process creates B sets of replicate weights, where the b-th set of replicate weights is a vector of length n denoted a(b), whose k-th value is denoted ak(b). This yields B replicate estimates of the population total, T^y∗(b)=∑k∈sak(b)y˘k, for b=1,…B, which can be used to estimate sampling variance.
vB(T^y)=B∑b=1B(T^y∗(b)−T^y)2
This bootstrap variance estimator can be written as a quadratic form:
vB(T^y)=y˘′ΣBy˘
where
ΣB=B∑b=1B(a(b)−1n)(a(b)−1n)′
Note that if the vector of adjustment factors a(b) has expectation 1n and variance-covariance matrix Σ, then we have the bootstrap expectation E∗(ΣB)=Σ. Since the bootstrap process takes the sample values y˘ as fixed, the bootstrap expectation of the variance estimator is E∗(y˘′ΣBy˘)=y˘′Σy˘. Thus, we can produce a bootstrap variance estimator with the same expectation as the textbook variance estimator simply by randomly generating a(b) from a distribution with the following two conditions:
Condition 1 : E∗(a)=1n
Condition 2 : E∗(a−1n)(a−1n)′=Σ
While there are multiple ways to generate adjustment factors satisfying these conditions, the simplest general method is to simulate from a multivariate normal distribution: a∼MVN(1n,Σ). This is the method used by this function.
Details on Rescaling to Avoid Negative Adjustment Factors
Let A=[a(1)⋯a(b)⋯a(B)] denote the (n×B) matrix of bootstrap adjustment factors. To eliminate negative adjustment factors, Beaumont and Patak (2012) propose forming a rescaled matrix of nonnegative replicate factors AS by rescaling each adjustment factor ak(b) as follows:
akS,(b)=τak(b)+τ−1
where τ≥1−ak(b)≥1 for all k in {1,…,n} and all b in {1,…,B}.
The value of τ can be set based on the realized adjustment factor matrix A or by choosing τ prior to generating the adjustment factor matrix A so that τ is likely to be large enough to prevent negative bootstrap weights.
If the adjustment factors are rescaled in this manner, it is important to adjust the scale factor used in estimating the variance with the bootstrap replicates, which becomes Bτ2 instead of B1.
Prior to rescaling: vB(T^y)=B1b=1∑B(T^y∗(b)−T^y)2After rescaling: vB(T^y)=Bτ2b=1∑B(T^yS∗(b)−T^y)2
When sharing a dataset that uses rescaled weights from a generalized survey bootstrap, the documentation for the dataset should instruct the user to use replication scale factor Bτ2 rather than B1 when estimating sampling variances.
Examples
## Not run: library(survey)# Load an example dataset that uses unequal probability sampling ---- data('election', package ='survey')# Create matrix to represent the Horvitz-Thompson estimator as a quadratic form ---- n <- nrow(election_pps) pi <- election_jointprob
horvitz_thompson_matrix <- matrix(nrow = n, ncol = n)for(i in seq_len(n)){for(j in seq_len(n)){ horvitz_thompson_matrix[i,j]<-1-(pi[i,i]* pi[j,j])/pi[i,j]}}## Equivalently: horvitz_thompson_matrix <- make_quad_form_matrix( variance_estimator ="Horvitz-Thompson", joint_probs = election_jointprob
)# Make generalized bootstrap adjustment factors ---- bootstrap_adjustment_factors <- make_gen_boot_factors( Sigma = horvitz_thompson_matrix, num_replicates =80, tau ='auto')# Determine replication scale factor for variance estimation ---- tau <- attr(bootstrap_adjustment_factors,'tau') B <- ncol(bootstrap_adjustment_factors) replication_scaling_constant <- tau^2/ B
# Create a replicate design object ---- election_pps_bootstrap_design <- svrepdesign( data = election_pps, weights =1/ diag(election_jointprob), repweights = bootstrap_adjustment_factors, combined.weights =FALSE, type ="other", scale = attr(bootstrap_adjustment_factors,'scale'), rscales = attr(bootstrap_adjustment_factors,'rscales'))# Compare estimates to Horvitz-Thompson estimator ---- election_pps_ht_design <- svydesign( id =~1, fpc =~p, data = election_pps, pps = ppsmat(election_jointprob), variance ="HT")svytotal(x =~ Bush + Kerry, design = election_pps_bootstrap_design)svytotal(x =~ Bush + Kerry, design = election_pps_ht_design)## End(Not run)
References
The generalized survey bootstrap was first proposed by Bertail and Combris (1997). See Beaumont and Patak (2012) for a clear overview of the generalized survey bootstrap. The generalized survey bootstrap represents one strategy for forming replication variance estimators in the general framework proposed by Fay (1984) and Dippo, Fay, and Morganstein (1984).
Beaumont, Jean-François, and Zdenek Patak. 2012. “On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Sampling: Generalized Bootstrap for Sample Surveys.” International Statistical Review 80 (1): 127–48. https://doi.org/10.1111/j.1751-5823.2011.00166.x.
Bertail, and Combris. 1997. “Bootstrap Généralisé d’un Sondage.” Annales d’Économie Et de Statistique, no. 46: 49. https://doi.org/10.2307/20076068.
Dippo, Cathryn, Robert Fay, and David Morganstein. 1984. “Computing Variances from Complex Samples with Replicate Weights.” In, 489–94. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/papers/1984_094.pdf.
The function make_quad_form_matrix can be used to represent several common variance estimators as a quadratic form's matrix, which can then be used as an input to make_gen_boot_factors().