h2o.stringdist function

Compute element-wise string distances between two H2OFrames

Compute element-wise string distances between two H2OFrames

Compute element-wise string distances between two H2OFrames. Both frames need to have the same shape (N x M) and only contain string/factor columns. Return a matrix (H2OFrame) of shape N x M.

h2o.stringdist( x, y, method = c("lv", "lcs", "qgram", "jaccard", "jw", "soundex"), compare_empty = TRUE )

Arguments

  • x: An H2OFrame
  • y: A comparison H2OFrame
  • method: A string identifier indicating what string distance measure to use. Must be one of: "lv" - Levenshtein distance "lcs" - Longest common substring distance "qgram" - q-gram distance "jaccard" - Jaccard distance between q-gram profiles "jw" - Jaro, or Jaro-Winker distance "soundex" - Distance based on soundex encoding
  • compare_empty: if set to FALSE, empty strings will be handled as NaNs

Examples

## Not run: h2o.init() x <- as.h2o(c("Martha", "Dwayne", "Dixon")) y <- as.character(as.h2o(c("Marhta", "Duane", "Dicksonx"))) h2o.stringdist(x, y, method = "jw") ## End(Not run)
  • Maintainer: Tomas Fryda
  • License: Apache License (== 2.0)
  • Last published: 2024-01-11