h2o.tokenize() R function from [h2o]

Tokenize String

h2o.tokenize is similar to h2o.strsplit, the difference between them is that h2o.tokenize will store the tokenized text into a single column making it easier for additional processing (filtering stop words, word2vec algo, ...).


h2o.tokenize(x, split)

Arguments

x: The column or columns whose strings to tokenize.
split: The regular expression to split on.

Returns

An H2OFrame with a single column representing the tokenized Strings. Original rows of the input DF are separated by NA.

Examples


## Not run:

library(h2o)
h2o.init()
string_to_tokenize <- as.h2o("Split at every character and tokenize.")
tokenize_string <- h2o.tokenize(as.character(string_to_tokenize), "")
## End(Not run)

h2o package Read PDF manual

Maintainer: Tomas Fryda
License: Apache License (== 2.0)
Last published: 2024-01-11

Useful links

https://github.com/h2oai/h2o-3/issues
https://github.com/h2oai/h2o-3

h2o.tokenize function

Tokenize String

Arguments

Returns

Examples