Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control
Makes a codebook from a data frame
Compare model estimates based on synthesised and observed data
Comparison of synthesised and observed data
Compare univariate distributions of synthesised and observed data
Disclosure measures
Fitting (generalized) linear models to synthetic data
Merge levels of factors in a data frame
Multivariate comparison of synthesised and observed data
Disclosure measures for multiple of target variables.
Fitting multinomial models to synthetic data
Group numeric variables before synthesis
Fitting ordered logistic models to synthetic data
Importing original data sets form external files
Replications in synthetic data
Tools for statistical disclosure control (sdc)
Inference from synthetic data
Synthetic data object summaries
Synthesis with bagging
Synthesis with classification and regression trees (CART)
Synthesis of a group of categorical variables from a saturated model
Synthesis of a group of categorical variables by iterative proportiona...
Synthesis by linear regression after transformation of a dependent var...
Synthesis by logistic regression
Synthesis for a variable nested within another variable.
Synthesis by linear regression
Synthesis by normal linear regression preserving the marginal distribu...
Passive synthesis
Synthesis by predictive mean matching
Synthesis by ordered polytomous regression
Synthesis by unordered polytomous regression
Synthesis with a fast implementation of random forests
Generating synthetic data sets
Synthesis with random forest
Synthesis by simple random sampling
Synthesis from a saturated model based on all combinations of the pred...
syn.smooth
Synthesis of survival time by classification and regression trees (CAR...
check synthetic and original if not produced by synthpop.
Generating synthetic versions of sensitive microdata for statistical d...
Distributional comparison of synthesised and observed data
Tabular utility
Tables and plots of utility measures
Exporting synthetic data sets to external files
A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>. Functions to assess identity and attribute disclosure for the original and for the synthetic data are included in the package, and their use is illustrated in a vignette on disclosure (Practical Privacy Metrics for Synthetic Data).