derive_synth_datasets() R function from [synthACS]

Derive synthetic micro datasets for a given geography.

Derive synthetic micro datasets for each sub-geography of a given set of geographic macro data constraining tabulations. See Details... By default, micro dataset generation is run in parallel with load balancing. Macro data is assumed to have been pulled from the US Census API via the acs package.


derive_synth_datasets(macro_data, parallel = TRUE, leave_cores = 2)

Arguments

macro_data: A macro dataset list: the result of pull_synth_data.
parallel: Logical, defaults to TRUE. Do you wish to run the operation in parallel?
leave_cores: How many cores do you wish to leave open to other processing?

Returns

A list of the input macro datasets produced by pull_synth_data and a list of synthetic micro datasets for each geographical subset within the specified macro geography.

Details

In the absence of true micro level datasets for a given geographic area, synthetic datasets can be used. This function uses conditional and marginal probability distributions (at the aggregate level) to generate synthetic micro population datasets, which are built one constraint at a time. Taking as input the macro level data (class "macroACS"), this function builds synthetic micro datasets for each lower level geographical area within the area of study.

In simplest terms, the goal is to generate a joint probability distribution for an attribute vector; and, to create synthetic individuals from this distribution. However, note that information for the full joint distribution is typically not available, so we construct it as a product of conditional and marginal probabilities. This is done one attribute at a time; where it is assumed that there is some sort of continuum of attribute dependence. That is, some attributes are more important (eg. gender, age) in 'determining' others (eg. educational attainment, marital status, etc). These more important attributes need to be assigned first, whereas less important attributes may be assigned later. Most of these distinctions are largely intuitive, but care must be taken in choosing the order of constructed attributes.

This function provides a synthetic population with the following characteristics as well as each synthetic individual's probability of inclusion. The included characteristics are: age, gender, marital status, educational attainment, employment status, nativity, poverty status, geographic mobility in the prior year, individual income, and race. Additional attributes which interest the user may be added in a similar manner via synthetic_new_attribute.

Note: INDIVIDUAL, not HOUSEHOLD level, synthetic population datasets are created.

Examples


## Not run:

# make geography
la_geo <- acs::geo.make(state= "CA", county= "Los Angeles", tract= "*")
# pull data elements for creating synthetic data
la_dat <- pull_synth_data(2014, 5, la_geo)
# derive synthetic data
la_synthetic <- derive_synth_datasets(la_dat, leave_cores= 0)
## End(Not run)

References

Birkin, Mark, and M. Clarke. "SYNTHESIS-a synthetic spatial information system for urban and regional analysis: methods and examples." Environment and planning A 20.12 (1988): 1645-1671.

derive_synth_datasets function