h2o.drop_duplicates function

Drops duplicated rows.

Drops duplicated rows.

Drops duplicated rows across specified columns.

h2o.drop_duplicates(frame, columns, keep = "first")

Arguments

  • frame: An H2OFrame object to drop duplicates on.
  • columns: Columns to compare during the duplicate detection process.
  • keep: Which rows to keep. The "first" value (default) keeps the first row and deletes the rest. The "last" keeps the last row.

Examples

## Not run: library(h2o) h2o.init() data <- as.h2o(iris) deduplicated_data <- h2o.drop_duplicates(data, c("Species", "Sepal.Length"), keep = "first") ## End(Not run)
  • Maintainer: Tomas Fryda
  • License: Apache License (== 2.0)
  • Last published: 2024-01-11