Compare survey statistics calculated separately from different sets of replicate weights
Compare survey statistics calculated separately from different sets of replicate weights
A modified version of the svyby() function from the survey package. Whereas svyby() calculates statistics separately for each subset formed by a specified grouping variable, svyby_repwts() calculates statistics separately for each replicate design, in addition to any additional user-specified grouping variables.
rep_designs: The replicate-weights survey designs to be compared. Supplied either as:
A named list of replicate-weights survey design objects, for example list('nr' = nr_adjusted_design, 'ue' = ue_adjusted_design).
A 'stacked' replicate-weights survey design object created by stack_replicate_designs().
The designs must all have the same number of columns of replicate weights, of the same type (bootstrap, JKn, etc.)
formula: A formula specifying the variables to pass to FUN
by: A formula specifying factors that define subsets
FUN: A function taking a formula and survey design object as its first two arguments. Usually a function from the survey package, such as svytotal or svymean.
...: Other arguments to FUN
deff: A value of TRUE or FALSE, indicating whether design effects should be estimated if possible.
keep.var: A value of TRUE or FALSE. If FUN returns a svystat object, indicates whether to extract standard errors from it.
keep.names: Define row names based on the subsets
verbose: If TRUE, print a label for each subset as it is processed.
vartype: Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance
drop.empty.groups: If FALSE, report NA for empty groups, if TRUE drop them from the output
return.replicates: If TRUE, return all the replicates as the "replicates" attribute of the result. This can be useful if you want to produce custom summaries of the estimates from each replicate.
na.rm.by: If true, omit groups defined by NA values of the by variables
na.rm.all: If true, check for groups with no non-missing observations for variables defined by formula and treat these groups as empty
multicore: Use multicore package to distribute subsets over multiple processors?
Returns
An object of class "svyby": a data frame showing the grouping factors and results of FUN for each combination of the grouping factors. The first grouping factor always consists of indicators for which replicate design was used for an estimate.
Examples
## Not run:suppressPackageStartupMessages(library(survey))data(api)dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)dclus1$variables$response_status <- sample(x = c("Respondent","Nonrespondent","Ineligible","Unknown eligibility"), size = nrow(dclus1), replace =TRUE)orig_rep_design <- as.svrepdesign(dclus1)# Adjust weights for cases with unknown eligibilityue_adjusted_design <- redistribute_weights( design = orig_rep_design, reduce_if = response_status %in% c("Unknown eligibility"), increase_if =!response_status %in% c("Unknown eligibility"), by = c("stype"))# Adjust weights for nonresponsenr_adjusted_design <- redistribute_weights( design = ue_adjusted_design, reduce_if = response_status %in% c("Nonrespondent"), increase_if = response_status =="Respondent", by = c("stype"))# Compare estimates from the three sets of replicate weights list_of_designs <- list('original'= orig_rep_design,'unknown eligibility adjusted'= ue_adjusted_design,'nonresponse adjusted'= nr_adjusted_design)##_ First compare overall means for two variables means_by_design <- svyby_repwts(formula =~ api00 + api99, FUN = svymean, rep_design = list_of_designs) print(means_by_design)##_ Next compare domain means for two variables domain_means_by_design <- svyby_repwts(formula =~ api00 + api99, by =~ stype, FUN = svymean, rep_design = list_of_designs) print(domain_means_by_design)# Calculate confidence interval for difference between estimatesests_by_design <- svyby_repwts(rep_designs = list('NR-adjusted'= nr_adjusted_design,'Original'= orig_rep_design), FUN = svymean, formula =~ api00 + api99)differences_in_estimates <- svycontrast(stat = ests_by_design, contrasts = list('Mean of api00: NR-adjusted vs. Original'= c(1,-1,0,0),'Mean of api99: NR-adjusted vs. Original'= c(0,0,1,-1)))print(differences_in_estimates)confint(differences_in_estimates, level =0.95)## End(Not run)