synthpop1.9-0 package

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

codebook.syn

Makes a codebook from a data frame

compare.fit.synds

Compare model estimates based on synthesised and observed data

compare

Comparison of synthesised and observed data

compare.synds

Compare univariate distributions of synthesised and observed data

disclosure

Disclosure measures

glm.synds

Fitting (generalized) linear models to synthetic data

mergelevels.syn

Merge levels of factors in a data frame

multi.compare

Multivariate comparison of synthesised and observed data

multi.disclosure

Disclosure measures for multiple of target variables.

multinom.synds

Fitting multinomial models to synthetic data

numtocat.syn

Group numeric variables before synthesis

polr.synds

Fitting ordered logistic models to synthetic data

read.obs

Importing original data sets form external files

replicated.uniques

Replications in synthetic data

sdc

Tools for statistical disclosure control (sdc)

summary.fit.synds

Inference from synthetic data

summary.synds

Synthetic data object summaries

syn.bag

Synthesis with bagging

syn.cart

Synthesis with classification and regression trees (CART)

syn.catall

Synthesis of a group of categorical variables from a saturated model

syn.ipf

Synthesis of a group of categorical variables by iterative proportiona...

syn.lognorm

Synthesis by linear regression after transformation of a dependent var...

syn.logreg

Synthesis by logistic regression

syn.nested

Synthesis for a variable nested within another variable.

syn.norm

Synthesis by linear regression

syn.normrank

Synthesis by normal linear regression preserving the marginal distribu...

syn.passive

Passive synthesis

syn.pmm

Synthesis by predictive mean matching

syn.polr

Synthesis by ordered polytomous regression

syn.polyreg

Synthesis by unordered polytomous regression

syn.ranger

Synthesis with a fast implementation of random forests

syn

Generating synthetic data sets

syn.rf

Synthesis with random forest

syn.sample

Synthesis by simple random sampling

syn.satcat

Synthesis from a saturated model based on all combinations of the pred...

syn.smooth

syn.smooth

syn.survctree

Synthesis of survival time by classification and regression trees (CAR...

synorig.compare

check synthetic and original if not produced by synthpop.

synthpop-package

Generating synthetic versions of sensitive microdata for statistical d...

utility.gen

Distributional comparison of synthesised and observed data

utility.tab

Tabular utility

utility.tables

Tables and plots of utility measures

write.syn

Exporting synthetic data sets to external files

A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>. Functions to assess identity and attribute disclosure for the original and for the synthetic data are included in the package, and their use is illustrated in a vignette on disclosure (Practical Privacy Metrics for Synthetic Data).