rake_to_benchmarks function

Re-weight data to match population benchmarks, using raking or post-stratification

Re-weight data to match population benchmarks, using raking or post-stratification

Adjusts weights in the data to ensure that estimated population totals for grouping variables match known population benchmarks. If there is only one grouping variable, simple post-stratification is used. If there are multiple grouping variables, raking (also known as iterative post-stratification) is used.

rake_to_benchmarks( survey_design, group_vars, group_benchmark_vars, max_iterations = 100, epsilon = 5e-06 )

Arguments

  • survey_design: A survey design object created with the survey package.
  • group_vars: Names of grouping variables in the data dividing the sample into groups for which benchmark data are available. These variables cannot have any missing values
  • group_benchmark_vars: Names of group benchmark variables in the data corresponding to group_vars. For each category of a grouping variable, the group benchmark variable gives the population benchmark (i.e. population size) for that category.
  • max_iterations: If there are multiple grouping variables, then raking is used rather than post-stratification. The parameter max_iterations controls the maximum number of iterations to use in raking.
  • epsilon: If raking is used, convergence for a given margin is declared if the maximum change in a re-weighted total is less than epsilon times the total sum of the original weights in the design.

Returns

A survey design object with raked or post-stratified weights

Details

Raking adjusts the weight assigned to each sample member so that, after reweighting, the weighted sample percentages for population subgroups match their known population percentages. In a sense, raking causes the sample to more closely resemble the population in terms of variables for which population sizes are known.

Raking can be useful to reduce nonresponse bias caused by having groups which are overrepresented in the responding sample relative to their population size. If the population subgroups systematically differ in terms of outcome variables of interest, then raking can also be helpful in terms of reduce sampling variances. However, when population subgroups do not differ in terms of outcome variables of interest, then raking may increase sampling variances.

There are two basic requirements for raking.

  • Basic Requirement 1 - Values of the grouping variable(s) must be known for all respondents.
  • Basic Requirement 2 - The population size of each group must be known (or precisely estimated).

When there is effectively only one grouping variable (though this variable can be defined as a combination of other variables), raking amounts to simple post-stratification. For example, simple post-stratification would be used if the grouping variable is "Age x Sex x Race", and the population size of each combination of age, sex, and race is known. The method of "iterative poststratification" (also known as "iterative proportional fitting") is used when there are multiple grouping variables, and population sizes are known for each grouping variable but not for combinations of grouping variables. For example, iterative proportional fitting would be necessary if population sizes are known for age groups and for gender categories but not for combinations of age groups and gender categories.

Examples

# Load the survey data data(involvement_survey_srs, package = "nrba") # Calculate population benchmarks population_benchmarks <- list( "PARENT_HAS_EMAIL" = data.frame( PARENT_HAS_EMAIL = c("Has Email", "No Email"), PARENT_HAS_EMAIL_POP_BENCHMARK = c(17036, 2964) ), "STUDENT_RACE" = data.frame( STUDENT_RACE = c( "AM7 (American Indian or Alaska Native)", "AS7 (Asian)", "BL7 (Black or African American)", "HI7 (Hispanic or Latino Ethnicity)", "MU7 (Two or More Races)", "PI7 (Native Hawaiian or Other Pacific Islander)", "WH7 (White)" ), STUDENT_RACE_POP_BENCHMARK = c(206, 258, 3227, 1097, 595, 153, 14464) ) ) # Add the population benchmarks as variables in the data involvement_survey_srs <- merge( x = involvement_survey_srs, y = population_benchmarks$PARENT_HAS_EMAIL, by = "PARENT_HAS_EMAIL" ) involvement_survey_srs <- merge( x = involvement_survey_srs, y = population_benchmarks$STUDENT_RACE, by = "STUDENT_RACE" ) # Create a survey design object library(survey) survey_design <- svydesign( weights = ~BASE_WEIGHT, id = ~UNIQUE_ID, fpc = ~N_STUDENTS, data = involvement_survey_srs ) # Subset data to only include respondents survey_respondents <- subset( survey_design, RESPONSE_STATUS == "Respondent" ) # Rake to the benchmarks raked_survey_design <- rake_to_benchmarks( survey_design = survey_respondents, group_vars = c("PARENT_HAS_EMAIL", "STUDENT_RACE"), group_benchmark_vars = c( "PARENT_HAS_EMAIL_POP_BENCHMARK", "STUDENT_RACE_POP_BENCHMARK" ), ) # Inspect estimates from respondents, before and after raking svymean( x = ~PARENT_HAS_EMAIL, design = survey_respondents ) svymean( x = ~PARENT_HAS_EMAIL, design = raked_survey_design ) svymean( x = ~WHETHER_PARENT_AGREES, design = survey_respondents ) svymean( x = ~WHETHER_PARENT_AGREES, design = raked_survey_design )
  • Maintainer: Ben Schneider
  • License: GPL (>= 3)
  • Last published: 2023-11-21

Useful links