hash_names function

Anonymise data using scrypt

Anonymise data using scrypt

This function uses the scrypt algorithm from libsodium to anonymise data, based on user-indicated data fields. Data fields are concatenated first, then each entry is hashed. The function can either return a full detailed output, or short labels ready to use for 'anonymised data'. Before concatenation (using "_" as a separator) to form labels, inputs are modified using [clean_labels()]

hash_names( ..., size = 6, full = TRUE, hashfun = "secure", salt = NULL, clean_labels = TRUE )

Arguments

  • ...: Data fields to be hashed.
  • size: The number of characters retained in the hash.
  • full: A logical indicating if the a full output should be returned as a data.frame, including original labels, shortened hash, and full hash.
  • hashfun: This defines the hashing function to be used. If you specify "secure" (default), it will use [sodium::scrypt()], which will be secure, but will be slow for large data sets. For fast hashing with no colisions, you can sepecify "fast", and it will use [sodium::sha256()], which is several orders of magnitude faster than [sodium::scrypt()]. You can also specify a hashing function that takes and returns a [raw][base::raw] vector of bytes that can be converted to character with [rawToChar()].
  • salt: An optional object that can be coerced to a character to be used to 'salt' the hashing algorithm (see details). Ignored if NULL.
  • clean_labels: A logical indicating if labels of variables should be standardized; defaults to TRUE

Details

The argument salt should be used for salting the algorithm, i.e. adding an extra input to the input fields (the 'salt') to change the resulting hash and prevent identification of individuals via pre-computed hash tables.

It is highly recommend to choose a secret, random salt in order make it harder for an attacker to decode the hash.

Examples

first_name <- c("Jane", "Joe", "Raoul") last_name <- c("Doe", "Smith", "Dupont") age <- c(25, 69, 36) # secure hashing hash_names(first_name, last_name, age, hashfun = "secure") # fast hashing hash_names(first_name, last_name, age, size = 8, full = FALSE, hashfun = "fast") ## salting the hashing (more secure!) hash_names(first_name, last_name) # unsalted - less secure hash_names(first_name, last_name, salt = 123) # salted with an integer hash_names(first_name, last_name, salt = "foobar") # salted with an character ## using a different hash algorithm if you want things to run faster hash_names(first_name, last_name, hashfun = "fast") # use sha256 algorithm

See Also

[clean_labels()], used to clean labels prior to hashing

[sodium::hash()] for available hashing functions.

Author(s)

Thibaut Jombart thibautjombart@gmail.com , Dirk Shchumacher mail@dirk-schumacher.net , Zhian N. Kamvar zkamvar@gmail.com

  • Maintainer: Thibaut Jombart
  • License: MIT + file LICENSE
  • Last published: 2023-01-13