dbplyr-slice function

Subset rows using their positions

Subset rows using their positions

These are methods for the dplyr generics slice_min(), slice_max(), and slice_sample(). They are translated to SQL using filter() and window functions (ROWNUMBER, MIN_RANK, or CUME_DIST depending on arguments). slice(), slice_head(), and slice_tail() are not supported since database tables have no intrinsic order.

If data is grouped, the operation will be performed on each group so that (e.g.) slice_min(db, x, n = 3) will select the three rows with the smallest value of x in each group.

## S3 method for class 'tbl_lazy' slice_min( .data, order_by, ..., n, prop, by = NULL, with_ties = TRUE, na_rm = TRUE ) ## S3 method for class 'tbl_lazy' slice_max( .data, order_by, ..., n, by = NULL, prop, with_ties = TRUE, na_rm = TRUE ) ## S3 method for class 'tbl_lazy' slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)

Arguments

  • .data: A lazy data frame backed by a database query.

  • order_by: Variable or function of variables to order by.

  • ...: Not used.

  • n, prop: Provide either n, the number of rows, or prop, the proportion of rows to select. If neither are supplied, n = 1 will be used.

    If n is greater than the number of rows in the group (or prop > 1), the result will be silently truncated to the group size. If the proportion of a group size is not an integer, it is rounded down.

  • by: <tidy-select> Optionally, a selection of columns to group by for just this operation, functioning as an alternative to group_by(). For details and examples, see ?dplyr_by .

  • with_ties: Should ties be kept together? The default, TRUE, may return more rows than you request. Use FALSE to ignore ties, and return the first n rows.

  • na_rm: Should missing values in order_by be removed from the result? If FALSE, NA values are sorted to the end (like in arrange()), so they will only be included if there are insufficient non-missing values to reach n/prop.

  • weight_by, replace: Not supported for database backends.

Examples

library(dplyr, warn.conflicts = FALSE) db <- memdb_frame(x = 1:3, y = c(1, 1, 2)) db %>% slice_min(x) %>% show_query() db %>% slice_max(x) %>% show_query() db %>% slice_sample() %>% show_query() db %>% group_by(y) %>% slice_min(x) %>% show_query() # By default, ties are includes so you may get more rows # than you expect db %>% slice_min(y, n = 1) db %>% slice_min(y, n = 1, with_ties = FALSE) # Non-integer group sizes are rounded down db %>% slice_min(x, prop = 0.5)
  • Maintainer: Hadley Wickham
  • License: MIT + file LICENSE
  • Last published: 2024-03-19