il_additional function

Additional Information-Loss measures

Additional Information-Loss measures

Measures IL_correl() and IL_variables() were proposed by Andrzej Mlodak and are (theoretically) bounded between 0 and 1.

IL_correl(x, xm) ## S3 method for class 'il_correl' print(x, digits = 3, ...) IL_variables(x, xm) ## S3 method for class 'il_variables' print(x, digits = 3, ...)

Arguments

  • x: an object coercible to a data.frame representing the original dataset
  • xm: an object coercible to a data.frame representing the perturbed, modified dataset
  • digits: number digits used for rounding when displaying results
  • ...: additional parameter for print-methods; currently ignored

Returns

the corresponding information-loss measure

Details

  • IL_correl(): is a information-loss measure that can be applied to common numerically scaled variables in x and xm. It is based on diagonal entries of inverse correlation matrices in the original and perturbed data.
  • IL_variables(): for common-variables in x and xm the individual distance-functions depend on the class of the variable; specifically these functions are different for numeric variables, ordered-factors and character/factor variables. The individual distances are summed up and scaled by n * m with n being the number of records and m being the number of (common) variables.

Details can be found in the references below

The implementation of IL_correl() differs slightly with the original proposition from Mlodak, A. (2020) as the constant multiplier was changed to 1 / sqrt(2) instead of 1/2 for better efficiency and interpretability of the measure.

Examples

data("Tarragona", package = "sdcMicro") res1 <- addNoise(obj = Tarragona, variables = colnames(Tarragona), noise = 100) IL_correl(x = as.data.frame(res1$x), xm = as.data.frame(res1$xm)) res2 <- addNoise(obj = Tarragona, variables = colnames(Tarragona), noise = 25) IL_correl(x = as.data.frame(res2$x), xm = as.data.frame(res2$xm)) # creating test-inputs n <- 150 x <- xm <- data.frame( v1 = factor(sample(letters[1:5], n, replace = TRUE), levels = letters[1:5]), v2 = rnorm(n), v3 = runif(3), v4 = ordered(sample(LETTERS[1:3], n, replace = TRUE), levels = c("A", "B", "C")) ) xm$v1[1:5] <- "a" xm$v2 <- rnorm(n, mean = 5) xm$v4[1:5] <- "A" IL_variables(x, xm)

References

Mlodak, A. (2020). Information loss resulting from statistical disclosure control of output data, Wiadomosci Statystyczne. The Polish Statistician, 2020, 65(9), 7-27, DOI: 10.5604/01.3001.0014.4121

Mlodak, A. (2019). Using the Complex Measure in an Assessment of the Information Loss Due to the Microdata Disclosure Control, Przegląd Statystyczny, 2019, 66(1), 7-26, DOI: 10.5604/01.3001.0013.8285

Author(s)

Bernhard Meindl bernhard.meindl@statistik.gv.at