Weighting of cases through iterative proportional fitting (Raking)
Weighting of cases through iterative proportional fitting (Raking)
Computes and standardizes weights via raking to compensate for non-stratified samples. It is based on the implementation in the survey R package. It reduces data collection #' biases in the norm data by the means of post stratification, thus reducing the effect of unbalanced data in percentile estimation and norm data modeling.
population.margins: A data.frame including three columns, specifying the variable name in the original dataset used for data stratification, the factor level of the variable and the according population share. Please ensure, the original data does not include factor levels, not present in the population.margins. Additionally, summing up the shares of the different levels of a variable should result in a value near 1.0. The first column must specify the name of the stratification variable, the second the level and the third the proportion
standardized: If TRUE (default), the raking weights are scaled to weights/min(weights)
Returns
a vector with the standardized weights
Details
This function computes standardized raking weights to overcome biases in norm samples. It generates weights, by drawing on the information of population shares (e. g. for sex, ethnic group, region ...) and subsequently reduces the influence of over-represented groups or increases underrepresented cases. The returned weights are either raw or standardized and scaled to be larger than 0.
Raking in general has a number of advantages over post stratification and it additionally allows cNORM to draw on larger datasets, since less cases have to be removed during stratification. To use this function, additionally to the data, a data frame with stratification variables has to be specified. The data frame should include a row with (a) the variable name, (b) the level of the variable and (c) the according population proportion.
Examples
# cNORM features a dataset on vocabulary development (ppvt)# that includes variables like sex or migration. In order# to weight the data, we have to specify the population shares.# According to census, the population includes 52% boys# (factor level 1 in the ppvt dataset) and 70% / 30% of persons# without / with a a history of migration (= 0 / 1 in the dataset).# First we set up the popolation margins with all shares of the# different levels:margins <- data.frame(variables = c("sex","sex","migration","migration"), levels = c(1,2,0,1), share = c(.52,.48,.7,.3))head(margins)# Now we use the population margins to generate weights# through rakingweights <- computeWeights(ppvt, margins)# There are as many different weights as combinations of# factor levels, thus only four in this specific caseunique(weights)# To include the weights in the cNORM modelling, we have# to pass them as weights. They are then used to set up# weighted quantiles and as weights in the regession.model <- cnorm(raw = ppvt$raw, group=ppvt$group, weights = weights)