count function

Count the observations in each group

Count the observations in each group

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()). count() is paired with tally(), a lower-level helper that is equivalent to df %>% summarise(n = n()). Supply wt to perform weighted counts, switching the summary from n = n() to n = sum(wt).

add_count() and add_tally() are equivalents to count() and tally()

but use mutate() instead of summarise() so that they add a new column with group-wise counts.

count(x, ..., wt = NULL, sort = FALSE, name = NULL) ## S3 method for class 'data.frame' count( x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = group_by_drop_default(x) ) tally(x, wt = NULL, sort = FALSE, name = NULL) add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated()) add_tally(x, wt = NULL, sort = FALSE, name = NULL)

Arguments

  • x: A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

  • ...: <data-masking> Variables to group by.

  • wt: <data-masking> Frequency weights. Can be NULL or a variable:

    • If NULL (the default), counts the number of rows in each group.
    • If a variable, computes sum(wt) for each group.
  • sort: If TRUE, will show the largest groups at the top.

  • name: The name of the new column in the output.

    If omitted, it will default to n. If there's already a column called n, it will use nn. If there's a column called n and nn, it'll use nnn, and so on, adding ns until it gets a new name.

  • .drop: Handling of factor levels that don't appear in the data, passed on to group_by().

    For count(): if FALSE will include counts for empty groups (i.e. for levels of factors that don't exist in the data).

    For add_count(): deprecated since it can't actually affect the output.

Returns

An object of the same type as .data. count() and add_count()

group transiently, so the output has the same groups as the input.

Examples

# count() is a convenient way to get a sense of the distribution of # values in a dataset starwars %>% count(species) starwars %>% count(species, sort = TRUE) starwars %>% count(sex, gender, sort = TRUE) starwars %>% count(birth_decade = round(birth_year, -1)) # use the `wt` argument to perform a weighted count. This is useful # when the data has already been aggregated once df <- tribble( ~name, ~gender, ~runs, "Max", "male", 10, "Sandra", "female", 1, "Susan", "female", 4 ) # counts rows: df %>% count(gender) # counts runs: df %>% count(gender, wt = runs) # When factors are involved, `.drop = FALSE` can be used to retain factor # levels that don't appear in the data df2 <- tibble( id = 1:5, type = factor(c("a", "c", "a", NA, "a"), levels = c("a", "b", "c")) ) df2 %>% count(type) df2 %>% count(type, .drop = FALSE) # Or, using `group_by()`: df2 %>% group_by(type, .drop = FALSE) %>% count() # tally() is a lower-level function that assumes you've done the grouping starwars %>% tally() starwars %>% group_by(species) %>% tally() # both count() and tally() have add_ variants that work like # mutate() instead of summarise df %>% add_count(gender, wt = runs) df %>% add_tally(wt = runs)
  • Maintainer: Hadley Wickham
  • License: MIT + file LICENSE
  • Last published: 2023-11-17