.data: a (grouped) data frame or named list of columns. Grouped data can be created with fgroup_by or dplyr::group_by.
...: name-value pairs of summary functions, across statements, or arbitrary expressions resulting in a list. See Examples. For fast performance use the Fast Statistical Functions .
keep.group_vars: logical. FALSE removes grouping variables after computation.
.cols: for expressions involving .data, .cols can be used to subset columns, e.g. mtcars |> gby(cyl) |> smr(mctl(cor(.data), TRUE), .cols = 5:7). Can pass column names, indices, a logical vector or a selector function (e.g. is.numericr).
Returns
If .data is grouped by fgroup_by or dplyr::group_by, the result is a data frame of the same class and attributes with rows reduced to the number of groups. If .data is not grouped, the result is a data frame of the same class and attributes with 1 row.
Note
Since v1.7, fsummarise is fully featured, allowing expressions using functions and columns of the data as well as external scalar values (just like dplyr::summarise). NOTE however that once a Fast Statistical Function is used, the execution will be vectorized instead of split-apply-combine computing over groups. Please see the first Example.
See Also
across, collap, Data Frame Manipulation , Fast Statistical Functions , Collapse Overview
Examples
## Since v1.7, fsummarise supports arbitrary expressions, and expressions## containing fast statistical functions receive vectorized execution:# (a) This is an expression using base R functions which is executed by groupsmtcars |> fgroup_by(cyl)|> fsummarise(res = mean(mpg)+ min(qsec))# (b) Here, the use of fmean causes the whole expression to be executed# in a vectorized way i.e. the expression is translated to something like# fmean(mpg, g = cyl) + min(mpg) and executed, thus the result is different# from (a), because the minimum is calculated over the entire samplemtcars |> fgroup_by(cyl)|> fsummarise(mpg = fmean(mpg)+ min(qsec))# (c) For fully vectorized execution, use fmin. This yields the same as (a)mtcars |> fgroup_by(cyl)|> fsummarise(mpg = fmean(mpg)+ fmin(qsec))# More advanced use: vectorized grouped regression slopes: mpg ~ carbmtcars |> fgroup_by(cyl)|> fmutate(dm_carb = fwithin(carb))|> fsummarise(beta = fsum(mpg, dm_carb)%/=% fsum(dm_carb^2))# In across() statements it is fine to mix different functions, each will# be executed on its own terms (i.e. vectorized for fmean and standard for sum)mtcars |> fgroup_by(cyl)|> fsummarise(across(mpg:hp, list(fmean, sum)))# Note that this still detects fmean as a fast function, the names of the list# are irrelevant, but the function name must be typed or passed as a character vector,# Otherwise functions will be executed by groups e.g. function(x) fmean(x) won't vectorizemtcars |> fgroup_by(cyl)|> fsummarise(across(mpg:hp, list(mu = fmean, sum = sum)))# We can force none-vectorized execution by setting .apply = TRUEmtcars |> fgroup_by(cyl)|> fsummarise(across(mpg:hp, list(mu = fmean, sum = sum), .apply =TRUE))# Another argument of across(): Order the result first by function, then by columnmtcars |> fgroup_by(cyl)|> fsummarise(across(mpg:hp, list(mu = fmean, sum = sum), .transpose =FALSE))# Since v1.9.0, can also evaluate arbitrary expressionsmtcars |> fgroup_by(cyl, vs, am)|> fsummarise(mctl(cor(cbind(mpg, wt, carb)), names =TRUE))# This can also be achieved using across():corfun <-function(x) mctl(cor(x), names =TRUE)mtcars |> fgroup_by(cyl, vs, am)|> fsummarise(across(c(mpg, wt, carb), corfun, .apply =FALSE))#----------------------------------------------------------------------------# Examples that also work for pre 1.7 versions# Simple usefsummarise(mtcars, mean_mpg = fmean(mpg), sd_mpg = fsd(mpg))# Using base functions (not a big difference without groups)fsummarise(mtcars, mean_mpg = mean(mpg), sd_mpg = sd(mpg))# Grouped usemtcars |> fgroup_by(cyl)|> fsummarise(mean_mpg = fmean(mpg), sd_mpg = fsd(mpg))# This is still efficient but quite a bit slower on large data (many groups)mtcars |> fgroup_by(cyl)|> fsummarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))# Weighted aggregationmtcars |> fgroup_by(cyl)|> fsummarise(w_mean_mpg = fmean(mpg, wt), w_sd_mpg = fsd(mpg, wt))## Can also group with dplyr::group_by, but at a conversion cost, see ?GRPlibrary(dplyr)mtcars |> group_by(cyl)|> fsummarise(mean_mpg = fmean(mpg), sd_mpg = fsd(mpg))# Again less efficient...mtcars |> group_by(cyl)|> fsummarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))