Local recoding via Edmond's maximum weighted matching algorithm
Local recoding via Edmond's maximum weighted matching algorithm
To be used on both categorical and numeric input variables, although usage on categorical variables is the focus of the development of this software.
methods
ancestors: Names of ancestors of the cateorical variables
ancestor_setting: For each ancestor the corresponding categorical variable
k_level: Level for k-anonymity
FindLowestK: requests the program to look for the smallest k that results in complete matches of the data.
weight: A weight for each variable (Default=1)
lowMemory: Slower algorithm with less memory consumption
missingValue: The output value for a suppressed value.
...: see arguments below
categorical: Names of categorical variables
numerical: Names of numerical variables
Returns
dataframe with original variables and the supressed variables (suffix _lr). / the modified sdcMicroObj-class
Details
Each record in the data represents a category of the original data, and hence all records in the input data should be unique by the N Input Variables. To achieve bigger category sizes (k-anoymity), one can form new categories based on the recoding result and repeatedly apply this algorithm.
Kowarik, A. and Templ, M. and Meindl, B. and Fonteneau, F. and Prantner, B.: Testing of IHSN Cpp Code and Inclusion of New Methods into sdcMicro, in: Lecture Notes in Computer Science, J. Domingo-Ferrer, I. Tinnirello (editors.); Springer, Berlin, 2012, ISBN: 978-3-642-33626-3, pp. 63-77. tools:::Rd_expr_doi("10.1007/978-3-642-33627-0_6")
Author(s)
Alexander Kowarik, Bernd Prantner, IHSN C++ source, Akimichi Takemura