data: Input data, typically a data frame, tibble, or data.table. If data is not a classic data frame, it will be coerced to one internally unless preAggregate is TRUE and aggregatePackage is "data.table".
maxN: Suppression parameter. Cells with frequency <= maxN are set as primary suppressed. Using the default primary function, maxN is by default set to 3. See details.
freqVar: A single variable holding counts (name or number).
dimVar: The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.
hierarchies: List of hierarchies, which can be converted by AutoHierarchies. Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.
formula: A model formula
...: Further arguments to be passed to the supplied functions and to ModelMatrix (such as inputInOutput and removeEmpty).
spec: NULL or a named list of arguments that will act as default values.
Returns
data frame containing aggregated data and suppression information.
Details
The specs provided in the package (see PackageSpecs) provide common parameter setups for small count suppression. However, it might be necessary to customize the parameters further. In this case, certain parameters from GaussSuppressionFromData
might need adjusting from the values provided by the package specs. In particular, the parameters protectZeros (should zeros be primary suppressed), extend0 (should empty cells be added before primary suppression), and secondaryZeros (should zero frequency cells be candidates for secondary suppression) might be of interest. The examples below illustrate how to override parameters specified by a spec. Note that this is only possible if specLock = FALSE.
Examples
mun_accidents <- SSBtoolsData("mun_accidents")SuppressSmallCounts(data = mun_accidents, maxN =3, dimVar =1:2, freqVar =3)# override default specSuppressSmallCounts(data = mun_accidents, maxN =3, dimVar =1:2, freqVar =3, protectZeros =FALSE)d2 <- SSBtoolsData("d2")d2$f <- round(d2$freq/10)# tenth as frequency in examples# Hierarchical region variables are detected automatically -> same output columnSuppressSmallCounts(data = d2, maxN =2, freqVar ="f", dimVar = c("region","county","k_group"))# Formula. Hierarchical variables still detected automatically.SuppressSmallCounts(data = d2, maxN =3, freqVar ="f", formula =~main_income * k_group + region + county - k_group)# With hierarchies created manuallyml <- data.frame(levels = c("@","@@","@@@","@@@","@@@","@@"), codes = c("Total","not_assistance","other","pensions","wages","assistance"))SuppressSmallCounts(data = d2, maxN =2, freqVar ="f", hierarchies = list(main_income = ml, k_group ="Total_Norway"))# Data without pensions in k_group 400 # And assume these are structural zeros (will not be suppressed)SuppressSmallCounts(data = d2[1:41,], maxN =3, freqVar ="f", hierarchies = list(main_income = ml, k_group ="Total_Norway"), extend0 =FALSE, structuralEmpty =TRUE)# -- Note for the example above -- # With protectZeros = FALSE # - No zeros suppressed# With extend0 = FALSE and structuralEmpty = FALSE # - Primary suppression without protection (with warning) # With extend0 = TRUE and structuralEmpty = TRUE # - As default behavior. Suppression/protection of all zeros (since nothing empty)# With formula instead of hierarchies: Extra parameter needed when extend0 = FALSE.# - removeEmpty = FALSE, to include empty zeros in output. # Using formula followed by FormulaSelection output <- SuppressSmallCounts(data = SSBtoolsData("example1"), formula =~age * geo * year + eu * year, freqVar ="freq", maxN =1)FormulaSelection(output,~(age + eu)* year)# To illustrate hierarchical_extend0 # (parameter to underlying function, SSBtools::Extend0fromModelMatrixInput)SuppressSmallCounts(data = SSBtoolsData("example1"), formula =~age * geo * eu, freqVar ="freq", maxN =0, avoidHierarchical =TRUE)SuppressSmallCounts(data = SSBtoolsData("example1"), formula =~age * geo * eu, freqVar ="freq", maxN =0, avoidHierarchical =TRUE, hierarchical_extend0 =TRUE)# This example is similar to the one in the documentation of tables_by_formulas, # but it uses SuppressSmallCounts, and the input data (SSBtoolsData("magnitude1")) # is used to generate a frequency table by excluding the "value" variable. tables_by_formulas(SSBtoolsData("magnitude1"), table_fun = SuppressSmallCounts, table_formulas = list(table_1 =~region * sector2, table_2 =~region1:sector4 -1, table_3 =~region + sector4 -1), substitute_vars = list(region = c("geo","eu"), region1 ="eu"), collapse_vars = list(sector = c("sector2","sector4")), maxN =2)