GaussSuppressionTwoWay function

Two-way iteration variant of GaussSuppressionFromData

Two-way iteration variant of GaussSuppressionFromData

Internally, data is organized in a two-way table.

Use parameter colVar to choose hierarchies for columns (others will be rows). Iterations start by column by column suppression. The algorithm utilizes HierarchyCompute2.

With two-way iterations, larger data can be handled, but there is a residual risk. The method is a special form of linked-table iteration. Separately, the rows and columns are protected by GaussSuppression and they have common suppressed cells.

GaussSuppressionTwoWay( data, dimVar = NULL, freqVar = NULL, numVar = NULL, weightVar = NULL, charVar = NULL, hierarchies, formula = NULL, maxN = suppressWarnings(formals(c(primary)[[1]])$maxN), protectZeros = suppressWarnings(formals(c(primary)[[1]])$protectZeros), secondaryZeros = suppressWarnings(formals(candidates)$secondaryZeros), candidates = CandidatesDefault, primary = PrimaryDefault, forced = NULL, hidden = NULL, singleton = SingletonDefault, singletonMethod = ifelse(secondaryZeros, "anySumNOTprimary", "anySum"), printInc = TRUE, output = "publish", preAggregate = is.null(freqVar), colVar = names(hierarchies)[1], removeEmpty = TRUE, inputInOutput = TRUE, candidatesFromTotal = TRUE, structuralEmpty = FALSE, freqVarNew = rev(make.unique(c(names(data), "freq")))[1], ... )

Arguments

  • data: Input data as a data frame
  • dimVar: The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.
  • freqVar: A single variable holding counts (name or number).
  • numVar: Other numerical variables to be aggregated
  • weightVar: weightVar Weights (costs) to be used to order candidates for secondary suppression
  • charVar: Other variables possibly to be used within the supplied functions
  • hierarchies: List of hierarchies, which can be converted by AutoHierarchies. Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.
  • formula: A model formula
  • maxN: Suppression parameter. See GaussSuppressionFromData.
  • protectZeros: Suppression parameter. See GaussSuppressionFromData.
  • secondaryZeros: Suppression parameter. See GaussSuppressionFromData.
  • candidates: GaussSuppression input or a function generating it (see details) Default: CandidatesDefault
  • primary: GaussSuppression input or a function generating it (see details) Default: PrimaryDefault
  • forced: GaussSuppression input or a function generating it (see details)
  • hidden: GaussSuppression input or a function generating it (see details)
  • singleton: NULL or a function generating GaussSuppression input (logical vector not possible) Default: SingletonDefault
  • singletonMethod: GaussSuppression input
  • printInc: GaussSuppression input
  • output: One of "publish" (default), "inner". Here "inner" means input data (possibly pre-aggregated).
  • preAggregate: When TRUE, the data will be aggregated within the function to an appropriate level. This is defined by the dimensional variables according to dimVar, hierarchies or formula and in addition charVar.
  • colVar: Hierarchy variables for the column groups (others in row group).
  • removeEmpty: When TRUE (default) empty output corresponding to empty input is removed. When NULL, removal only within the algorithm (x matrices) so that such empty outputs are never secondary suppressed.
  • inputInOutput: Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to "rowFactor" or "" are ignored.
  • candidatesFromTotal: When TRUE (default), same candidates for all rows and for all columns, computed from row/column totals.
  • structuralEmpty: See GaussSuppressionFromData.
  • freqVarNew: Name of new frequency variable generated when input freqVar is NULL and preAggregate is TRUE. Default is "freq" provided this is not found in names(data).
  • ...: Further arguments to be passed to the supplied functions.

Returns

Aggregated data with suppression information

Details

The supplied functions for generating GaussSuppression input behave as in GaussSuppressionFromData with some exceptions. When candidatesFromTotal is TRUE (default) the candidate function will be run locally once for rows and once for columns. Each time based on column or row totals. The global x-matrix will only be generated if one of the functions supplied needs it. Non-NULL singleton can only be supplied as a function. This function will be run locally within the algorithm before each call to GaussSuppression.

Note that a difference from GaussSuppressionFromData is that parameter removeEmpty is set to TRUE by default.

Another difference is that duplicated combinations is not allowed. Normally duplicates are avoided by setting preAggregate to TRUE. When the charVar parameter is used, this can still be a problem. See the examples for a possible workaround.

Examples

z3 <- SSBtoolsData("z3") dimListsA <- SSBtools::FindDimLists(z3[, 1:6]) dimListsB <- SSBtools::FindDimLists(z3[, c(1, 4, 5)]) set.seed(123) z <- z3[sample(nrow(z3),250),] ## Not run: out1 <- GaussSuppressionTwoWay(z, freqVar = "ant", hierarchies = dimListsA, colVar = c("hovedint")) ## End(Not run) out2 <- GaussSuppressionTwoWay(z, freqVar = "ant", hierarchies = dimListsA, colVar = c("hovedint", "mnd")) out3 <- GaussSuppressionTwoWay(z, freqVar = "ant", hierarchies = dimListsB, colVar = c("region")) out4 <- GaussSuppressionTwoWay(z, freqVar = "ant", hierarchies = dimListsB, colVar = c("hovedint", "region")) # "mnd" not in hierarchies -> duplicated combinations in input # Error when preAggregate is FALSE: Index method failed. Duplicated combinations? out5 <- GaussSuppressionTwoWay(z, freqVar = "ant", hierarchies = dimListsA[1:3], protectZeros = FALSE, colVar = c("hovedint"), preAggregate = TRUE) # charVar needed -> Still problem when preAggregate is TRUE # Possible workaround by extra hierarchy out6 <- GaussSuppressionTwoWay(z, freqVar = "ant", charVar = "mnd2", hierarchies = c(dimListsA[1:3], mnd2 = "Total"), # include charVar inputInOutput = c(TRUE, TRUE, FALSE), # FALSE -> only Total protectZeros = FALSE, colVar = c("hovedint"), preAggregate = TRUE, hidden = function(x, data, charVar, ...) as.vector((Matrix::t(x) %*% as.numeric(data[[charVar]] == "M06M12")) == 0))