redact_columns function

Redact columns from a dataframe with the default redactors

Redact columns from a dataframe with the default redactors

This function redacts the columns specified in columns in the data given in data using dittodb's standard redactors.

redact_columns(data, columns, ignore.case = TRUE, ...)

Arguments

  • data: a dataframe to redact
  • columns: character, the columns to redact
  • ignore.case: should case be ignored? (default: TRUE)
  • ...: additional options to pass on to grep() when matching the column names

Returns

data, with the columns specified in columns duly redacted

Details

The column names given in the columns argument are treated as regular expressions, however they always have ^ and $ added to the beginning and end of the strings. So if you would like to match any column that starts with the string sensitive (e.g. sensitive_name, sensitive_date) you could use "sensitive.* and this would catch all of those columns (though it would not catch a column called most_sensitive_name).

The standard redactors replace all values in the column with the following values based on the columns type:

  • integer -- 9L
  • numeric -- 9
  • character -- "[redacted]"
  • POSIXct (date times) -- as.POSIXct("1988-10-11T17:00:00", tz = tzone)

Examples

if (check_for_pkg("nycflights13", message)) { small_flights <- head(nycflights13::flights) # with no columns specified, redacting does nothing redact_columns(small_flights, columns = NULL) # integer redact_columns(small_flights, columns = c("arr_time")) # numeric redact_columns(small_flights, columns = c("arr_delay")) # characters redact_columns(small_flights, columns = c("origin", "dest")) # datetiems redact_columns(small_flights, columns = c("time_hour")) }
  • Maintainer: Jonathan Keane
  • License: Apache License (>= 2.0)
  • Last published: 2024-04-09