deidentify_data function

deidentify_data

deidentify_data

Deidentify a dataset

Two operations are performed on the dataset:

  1. All ID numbers are randomized from the range 1 to n
  2. All columns containing dates will have the year changed

The year change is done by letting the earliest year in the dataset be used as a reference and by maintaining leap years. The reference year will either be 1901, 1902, 1903 or 1904 depending on its distance to the closest preceeding leap year.

deidentify_data(df, id_column = "ID", date_columns = NULL)

Arguments

  • df: (data.frame) A dataset
  • id_column: (str) Name of the id column
  • date_columns: (array(str) (optional)) Names of all date columns

Returns

(data.frame) Deidentified dataset

  • Maintainer: Rikard Nordgren
  • License: LGPL (>= 3)
  • Last published: 2024-12-04

Downloads (last 30 days):