SuppressDominantCells function

Suppress magnitude tables using dominance (n,k) or p% rule for primary suppression.

Suppress magnitude tables using dominance (n,k) or p% rule for primary suppression.

This function utilizes MagnitudeRule.

SuppressDominantCells( data, n = 1:length(k), k = NULL, pPercent = NULL, allDominance = FALSE, dominanceVar = NULL, numVar = NULL, dimVar = NULL, hierarchies = NULL, formula = NULL, contributorVar = NULL, sWeightVar = NULL, ..., candidatesVar = NULL, singletonZeros = FALSE, preAggregate = !is.null(contributorVar) & is.null(sWeightVar), spec = PackageSpecs("dominanceSpec") )

Arguments

  • data: Input data, typically a data frame, tibble, or data.table. If data is not a classic data frame, it will be coerced to one internally unless preAggregate is TRUE and aggregatePackage is "data.table".
  • n: Parameter n in dominance rule. Default is 1:length(k).
  • k: Parameter k in dominance rule.
  • pPercent: Parameter in the p% rule, when non-NULL. Parameters n and k will then be ignored. Technically, calculations are performed internally as if n = 1:2. The results of these intermediate calculations can be viewed by setting allDominance = TRUE.
  • allDominance: Logical. If TRUE, additional information is included in the output, as described in MagnitudeRule.
  • dominanceVar: Numerical variable to be used in dominance rule. The first numVar variable will be used if it is not specified.
  • numVar: Numerical variable to be aggregated. Any dominanceVar and candidatesVar that are specified and not included in numVar will be aggregated accordingly.
  • dimVar: The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.
  • hierarchies: List of hierarchies, which can be converted by AutoHierarchies. Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.
  • formula: A model formula
  • contributorVar: Extra variables to be used as grouping elements in the dominance rule. Typically, the variable contains the contributor IDs.
  • sWeightVar: Name of variable which represents sampling weights to be used in dominance rule
  • ...: Further arguments to be passed to the supplied functions and to ModelMatrix (such as inputInOutput and removeEmpty).
  • candidatesVar: Variable to be used in the candidate function to prioritize cells for publication and thus not suppression. If not specified, the same variable that is used for the dominance rule will be applied (see dominanceVar and numVar).
  • singletonZeros: When negative values cannot occur, one can determine from a non-suppressed marginal cell with the value 0 that all underlying cells also have the value 0. The use of singletonZeros = TRUE is intended to prevent this phenomenon from causing suppressed cells to be revealable. It is the zeros in the dominanceVar variable that are examined. Specifically, the ordinary singleton method is combined with a method that is actually designed for frequency tables. This approach also works for magnitude tables when SingletonUniqueContributor0 is utilized.
  • preAggregate: Parameter to GaussSuppressionFromData. Necessary to include here since the specification in spec cannot take sWeightVar into account.
  • spec: NULL or a named list of arguments that will act as default values.

Returns

data frame containing aggregated data and suppression information.

Examples

num <- c(100, 90, 10, 80, 20, 70, 30, 50, 25, 25, 40, 20, 20, 20, 25, 25, 25, 25) v1 <- c("v1", rep(c("v2", "v3", "v4"), each = 2), rep("v5", 3), rep(c("v6", "v7"), each = 4)) sweight <- c(1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1) d <- data.frame(v1 = v1, num = num, sweight = sweight) # basic use SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", formula = ~v1 -1) SuppressDominantCells(d, k = c(80,70), numVar = "num", formula = ~v1 -1) # same as above SuppressDominantCells(d, pPercent = 7, numVar = "num", formula = ~v1 -1) # with weights SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", dimVar = "v1", sWeightVar = "sweight") # overwriting some parameters in default spec SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", dimVar = "v1", sWeightVar = "sweight", domWeightMethod = "tauargus") # using dominance and few contributors rule together, see second example compared to first SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", formula = ~v1 -1, primary = c(DominanceRule, NContributorsRule), maxN = 3, allDominance = TRUE) SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", formula = ~v1 -1, primary = c(DominanceRule, NContributorsRule), maxN = 4, allDominance = TRUE) d2 <- SSBtoolsData("d2")[1:4] # Data considered as microdata set.seed(123) d2$v <- rnorm(nrow(d2))^2 # Hierarchical region variables are detected automatically -> same output column SuppressDominantCells(data = d2, n = c(1, 2), k = c(70, 95), numVar = "v", dimVar = c("region", "county", "k_group"), allDominance = TRUE) # Formula. Hierarchical variables still detected automatically. SuppressDominantCells(data = d2, n = c(1, 2), k = c(70, 95), numVar = "v", formula = ~main_income * k_group + region + county - k_group) # With hierarchies created manually ml <- data.frame(levels = c("@", "@@", "@@@", "@@@", "@@@", "@@"), codes = c("Total", "not_assistance", "other", "pensions", "wages", "assistance")) SuppressDominantCells(data = d2, n = c(1, 2), k = c(70, 95), numVar = "v", hierarchies = list(main_income = ml, k_group = "Total_Norway")) # With contributorVar and p% rule SuppressDominantCells(data= SSBtoolsData("magnitude1"), numVar = "value", dimVar= c("sector4", "geo"), contributorVar = "company", pPercent = 10, allDominance = TRUE) # Using formula followed by FormulaSelection output <- SuppressDominantCells(data = SSBtoolsData("magnitude1"), numVar = "value", formula = ~sector2 * geo + sector4 * eu, contributorVar = "company", k = c(80, 99)) FormulaSelection(output, ~sector2 * geo) # This example is similar to the one in the documentation of tables_by_formulas, # but it uses SuppressDominantCells with the pPercent and contributorVar parameters. tables_by_formulas(SSBtoolsData("magnitude1"), table_fun = SuppressDominantCells, table_formulas = list(table_1 = ~region * sector2, table_2 = ~region1:sector4 - 1, table_3 = ~region + sector4 - 1), substitute_vars = list(region = c("geo", "eu"), region1 = "eu"), collapse_vars = list(sector = c("sector2", "sector4")), dominanceVar = "value", pPercent = 10, contributorVar = "company")

See Also

SSBtools::tables_by_formulas()