all_geog_synthetic_new_attribute function

Add a new attribute to a set (ie list) of synthetic_micro datasets

Add a new attribute to a set (ie list) of synthetic_micro datasets

Add a new attribute to a set (ie list) of synthetic_micro datasets using conditional relationships between the new attribute and existing attributes (eg. wage rate conditioned on age and education level). The same attribute is added to each synthetic_micro dataset, where each dataset is supplied a distinct relationship for attribute creation.

all_geog_synthetic_new_attribute( df_list, prob_name = "p", attr_name = "variable", conditional_vars = NULL, st_list = NULL, leave_cores = 1L )

Arguments

  • df_list: A list of R objects each of class "synthetic_micro".

  • prob_name: A string specifying the column name of each data.frame in df_list

    containing the probabilities for each synthetic observation.

  • attr_name: A string specifying the desired name of the new attribute to be added to the data.

  • conditional_vars: An character vector specifying the existing variables, if any, on which the new attribute (variable) is to be conditioned on for each dataset. Variables must be specified in order. Defaults to NULL ie- an unconditional new attribute.

  • st_list: A list of equal length to df_list. Each element of st_list is a data.frame symbol table with N + 2 columns. The last two columns must be: 1. A vector containing the new attribute counts or percentages; 2. is a vector of the new attribute levels. The first N columns must match the conditioning scheme imposed by the variables in conditional_vars. See synthetic_new_attribute and examples.

  • leave_cores: An integer for the number of cores you wish to leave open for other processing.

Returns

A list of new synthetic_micro datasets each with class "synthetic_micro".

Examples

## Not run: set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), age= factor(sample(1:5, size= 100, replace= TRUE)), pov= factor(sample(c("lt_pov", "gt_eq_pov"), size= 100, replace= TRUE, prob= c(.15,.85))), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") # and example test elements cond_v <- c("gender", "pov") levels <- c("employed", "unemp", "not_in_LF") sym_tbl <- data.frame(gender= rep(rep(c("male", "female"), each= 3), 2), pov= rep(c("lt_pov", "gt_eq_pov"), each= 6), cnts= c(52, 8, 268, 72, 12, 228, 1338, 93, 297, 921, 105, 554), lvls= rep(levels, 4)) df_list <- replicate(10, df, simplify= FALSE) st_list <- replicate(10, sym_tbl, simplify= FALSE) # run library(parallel) syn <- all_geog_synthetic_new_attribute(df_list, prob_name= "p", attr_name= "variable", conditional_vars= cond_v,st_list= st_list) ## End(Not run)

See Also

synthetic_new_attribute

  • Maintainer: Alex Whitworth
  • License: MIT + file LICENSE
  • Last published: 2022-10-26

Useful links