funique function

Fast Unique Elements / Rows

Fast Unique Elements / Rows

funique is an efficient alternative to unique (or unique.data.table, kit::funique, dplyr::distinct).

fnunique is an alternative to NROW(unique(x)) (or data.table::uniqueN, kit::uniqLen, dplyr::n_distinct).

fduplicated is an alternative to duplicated (or duplicated.data.table, kit::fduplicated).

The collapse versions are versatile and highly competitive.

any_duplicated(x) is faster than any(fduplicated(x)). Note that for atomic vectors, anyDuplicated is currently more efficient if there are duplicates at the beginning of the vector.

funique(x, ...) ## Default S3 method: funique(x, sort = FALSE, method = "auto", ...) ## S3 method for class 'data.frame' funique(x, cols = NULL, sort = FALSE, method = "auto", ...) ## S3 method for class 'sf' funique(x, cols = NULL, sort = FALSE, method = "auto", ...) # Methods for indexed data / compatibility with plm: ## S3 method for class 'pseries' funique(x, sort = FALSE, method = "auto", drop.index.levels = "id", ...) ## S3 method for class 'pdata.frame' funique(x, cols = NULL, sort = FALSE, method = "auto", drop.index.levels = "id", ...) fnunique(x) # Fast NROW(unique(x)), for vectors and lists fduplicated(x, all = FALSE) # Fast duplicated(x), for vectors and lists any_duplicated(x) # Simple logical TRUE|FALSE duplicates check

Arguments

  • x: a atomic vector or data frame / list of equal-length columns.

  • sort: logical. TRUE orders the unique elements / rows. FALSE returns unique values in order of first occurrence.

  • method: an integer or character string specifying the method of computation:

    Int.StringDescription
    1"auto"automatic selection: hash if sort = FALSE else radix.
    2"radix"use radix ordering to determine unique values. Supports sort = FALSE but only for character data.
    3"hash"use index hashing to determine unique values. Supports sort = TRUE but only for atomic vectors (default method).
  • cols: compute unique rows according to a subset of columns. Columns can be selected using column names, indices, a logical vector or a selector function (e.g. is.character). Note: All columns are returned.

  • ...: arguments passed to radixorder, e.g. decreasing or na.last. Only applicable if method = "radix".

  • drop.index.levels: character. Either "id", "time", "all" or "none". See indexing .

  • all: logical. TRUE returns all duplicated values, including the first occurrence.

Details

If all values/rows are already unique, then x is returned. Otherwise a copy of x with duplicate rows removed is returned. See group for some additional computational details.

The sf method simply ignores the geometry column when determining unique values.

Methods for indexed data also subset the index accordingly.

any_duplicated is currently simply implemented as fnunique(x) < NROW(x), which means it does not have facilities to terminate early, and users are advised to use anyDuplicated with atomic vectors if chances are high that there are duplicates at the beginning of the vector. With no duplicate values or data frames, any_duplicated is considerably faster than anyDuplicated.

Note

These functions treat lists like data frames, unlike unique which has a list method to determine uniqueness of (non-atomic/heterogeneous) elements in a list.

No matrix method is provided. Please use the alternatives provided in package kit with matrices.

Returns

funique returns x with duplicate elements/rows removed, fnunique returns an integer giving the number of unique values/rows, fduplicated gives a logical vector with TRUE indicating duplicated elements/rows.

See Also

fndistinct, group, Fast Grouping and Ordering , Collapse Overview .

Examples

funique(mtcars$cyl) funique(gv(mtcars, c(2,8,9))) funique(mtcars, cols = c(2,8,9)) fnunique(gv(mtcars, c(2,8,9))) fduplicated(gv(mtcars, c(2,8,9))) fduplicated(gv(mtcars, c(2,8,9)), all = TRUE) any_duplicated(gv(mtcars, c(2,8,9))) any_duplicated(mtcars)
  • Maintainer: Sebastian Krantz
  • License: GPL (>= 2) | file LICENSE
  • Last published: 2025-03-10