t-test of differences in means/percentages relative to external estimates
t-test of differences in means/percentages relative to external estimates
Compare estimated means/percentages from the present survey to external estimates from a benchmark source. A t-test is used to evaluate whether the survey's estimates differ from the external estimates.
survey_design: A survey design object created with the survey package.
y_var: Name of dependent variable. For categorical variables, percentages of each category are tested.
ext_ests: A numeric vector containing the external estimate of the mean for the dependent variable. If variable is a categorical variable, a named vector of means must be provided.
ext_std_errors: (Optional) The standard errors of the external estimates. This is useful if the external data are estimated with an appreciable level of uncertainty, for instance if the external data come from a survey with a small-to-moderate sample size. If supplied, the variance of the difference between the survey and external estimates is estimated by adding the variance of the external estimates to the estimated variance of the survey's estimates.
na.rm: Whether to drop cases with missing values for y_var
null_difference: The hypothesized difference between the estimate and the external mean. Default is 0.
alternative: Can be one of the following:
'unequal': two-sided test of whether difference in means is equal to null_difference
'less': one-sided test of whether difference is less than null_difference
'greater': one-sided test of whether difference is greater than null_difference
degrees_of_freedom: The degrees of freedom to use for the test's reference distribution. Unless specified otherwise, the default is the design degrees of freedom minus one, where the design degrees of freedom are estimated using the survey package's degf method.
Returns
A data frame describing the results of the t-tests, one row per mean being compared.
Examples
library(survey)# Create a survey design ----data("involvement_survey_str2s", package ='nrba')involvement_survey_sample <- svydesign( data = involvement_survey_str2s, weights =~ BASE_WEIGHT, strata =~ SCHOOL_DISTRICT, ids =~ SCHOOL_ID + UNIQUE_ID, fpc =~ N_SCHOOLS_IN_DISTRICT + N_STUDENTS_IN_SCHOOL
)# Subset to only include survey respondents ----involvement_survey_respondents <- subset(involvement_survey_sample, RESPONSE_STATUS =="Respondent")# Test whether percentages of categorical variable differ from benchmark ----parent_email_benchmark <- c('Has Email'=0.85,'No Email'=0.15)t_test_vs_external_estimate( survey_design = involvement_survey_respondents, y_var ="PARENT_HAS_EMAIL", ext_ests = parent_email_benchmark
)# Test whether the sample mean differs from the population benchmark ----average_age_benchmark <-11t_test_vs_external_estimate( survey_design = involvement_survey_respondents, y_var ="STUDENT_AGE", ext_ests = average_age_benchmark, null_difference =0)
References
See Brick and Bose (2001) for an example of this analysis method and a discussion of its limitations.
Brick, M., and Bose, J. (2001). Analysis of Potential Nonresponse Bias. in Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association. http://www.asasrms.org/Proceedings/y2001/Proceed/00021.pdf