h2o.drop_duplicates function

Drops duplicated rows.

Drops duplicated rows.

Drops duplicated rows across specified columns.


h2o.drop_duplicates(frame, columns, keep = "first")

Arguments

frame: An H2OFrame object to drop duplicates on.
columns: Columns to compare during the duplicate detection process.
keep: Which rows to keep. The "first" value (default) keeps the first row and deletes the rest. The "last" keeps the last row.

Examples


## Not run:

library(h2o)
h2o.init()

data <- as.h2o(iris)
deduplicated_data <- h2o.drop_duplicates(data, c("Species", "Sepal.Length"), keep = "first")
## End(Not run)

h2o package Read PDF manual

Maintainer: Tomas Fryda
License: Apache License (== 2.0)
Last published: 2024-01-11

Useful links