To be used on categorical data stored as factors. The algorithm randomly changes the values of variables in selected records (usually the risky ones) according to an invariant probability transition matrix or a custom-defined transition matrix.
methods
obj: Input data. Allowed input data are objects of class data.frame, factor or sdcMicroObj .
variables: Names of variables in obj on which post-randomization should be applied. If obj is a factor, this argument is ignored. Please note that pram can only be applied to factor-variables.
strata_variables: names of variables for stratification (will be set automatically for an object of class sdcMicroObj . One can also specify an integer vector or factor that specifies that desired groups. This vector must match the dimension of the input data set, however. For a possible use case, have a look at the examples.
pd: minimum diagonal entries for the generated transition matrix P. Either a vector of length 1 (which is recycled) or a vector of the same length as the number of variables that should be postrandomized. It is also possible to set pd
to a numeric matrix. This matrix will be used directly as the transition matrix. The matrix must be constructed as follows:
the matrix must be a square matrix
the rownames and colnames of the matrix must match the levels (in the same order) of the factor-variable that should be postrandomized.
the rowSums and colSums of the matrix need to equal 1
It is also possible to combine the different ways. For details have a look at the examples.
alpha: amount of perturbation for the invariant Pram method. This is a numeric vector of length 1 (that will be recycled if necessary) or a vector of the same length as the number of variables. If one specified as transition matrix directly, alpha is ignored.
Returns
a modified sdcMicroObj object or a new object containing original and post-randomized variables (with suffix "_pram").
Note
Deprecated method 'pram_strata' is no longer available in sdcMicro > 4.5.0
Examples
data(testdata)## donttest is necessary because of ## Examples with CPU time > 2.5 times elapsed time## caused by using C++ code and/or data.table## using a factor variable as inputres <- pram(as.factor(testdata$roof))print(res)summary(res)## using a data.frame as input## pram can only be applied to factors## -- > we have to recode to factors beforehandtestdata$roof <- factor(testdata$roof)testdata$walls <- factor(testdata$walls)testdata$water <- factor(testdata$water)## pram() is applied within subgroups defined by## variables "urbrur" and "sex"res <- pram( obj = testdata, variables ="roof", strata_variables = c("urbrur","sex"))print(res)summary(res)## default parameters (pd = 0.8 and alpha = 0.5) for the generation## of the invariant transition matrix will be used for all variablesres1 <- pram( obj = testdata, variables = c("roof","walls","water"))print(res1)## specific parameter settings for each variableres2 <- pram( obj = testdata, variables = c("roof","walls","water"), pd = c(0.95,0.8,0.9), alpha =0.5)print(res2)## detailed information on pram-parameters (such as the transition matrix 'Rs')## is stored in the output, eg. for variable 'roof'#attr(res2, "pram_params")$roof## we can also specify a custom transition-matrix directlymat <- diag(length(levels(testdata$roof)))rownames(mat)<- colnames(mat)<- levels(testdata$roof)res3 <- pram( obj = testdata, variables ="roof", pd = mat)print(res3)# of course, nothing has changed!## it is possible use a transition matrix for a variable and use the 'traditional' way## of specifying a number for the minimal diagonal entries of the transision matrix## for other variables. In this case we must supply `pd` as list.res4 <- pram( obj = testdata, variables = c("roof","walls"), pd = list(mat,0.5), alpha = c(NA,0.5))print(res4)summary(res4)attr(res4,"pram_params")## application to objects of class sdcMicro with default parametersdata(testdata2)testdata2$urbrur <- factor(testdata2$urbrur)sdc <- createSdcObj( dat = testdata2, keyVars = c("roof","walls","water","electcon","relat","sex"), numVars = c("expend","income","savings"), w ="sampling_weight")sdc <- pram( obj = sdc, variables ="urbrur")print(sdc, type ="pram")## this is equal to the previous application. If argument 'variables' is NULL,## all variables from slot 'pramVars' will be used if possible.sdc <- createSdcObj( dat = testdata2, keyVars = c("roof","walls","water","electcon","relat","sex"), numVars = c("expend","income","savings"), w ="sampling_weight", pramVars ="urbrur")sdc <- pram(sdc)print(sdc, type="pram")## we can specify transition matrices for sdcMicroObj-objects too#testdata2$roof <- factor(testdata2$roof)sdc <- createSdcObj( dat = testdata2, keyVars = c("roof","walls","water","electcon","relat","sex"), numVars = c("expend","income","savings"), w ="sampling_weight")mat <- diag(length(levels(testdata2$roof)))rownames(mat)<- colnames(mat)<- levels(testdata2$roof)mat[1,]<- c(0.9,0,0,0.05,0.05)sdc <- pram( obj = sdc, variables ="roof", pd = mat)print(sdc, type ="pram")## we can also have a look at the transitionsget.sdcMicroObj(sdc,"pram")$transitions
Kowarik, A. and Templ, M. and Meindl, B. and Fonteneau, F. and Prantner, B.: Testing of IHSN Cpp Code and Inclusion of New Methods into sdcMicro, in: Lecture Notes in Computer Science, J. Domingo-Ferrer, I. Tinnirello (editors.); Springer, Berlin, 2012, ISBN: 978-3-642-33626-3, pp. 63-77. tools:::Rd_expr_doi("10.1007/978-3-642-33627-0_6")
Templ, M. and Kowarik, A. and Meindl, B.: Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. in: Journal of Statistical Software, 67 (4), 1--36, 2015. tools:::Rd_expr_doi("10.18637/jss.v067.i04")
Templ, M.: Statistical Disclosure Control for Microdata: Methods and Applications in R.
in: Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. tools:::Rd_expr_doi("10.1007/978-3-319-50272-4")
Author(s)
Alexander Kowarik, Matthias Templ, Bernhard Meindl