dapply function

Data Apply

Data Apply

dapply efficiently applies functions to columns or rows of matrix-like objects and by default returns an object of the same type and with the same attributes (unless the result is scalar and drop = TRUE). Alternatively it is possible to return the result in a plain matrix or data.frame. A simple parallelism is also available.

dapply(X, FUN, ..., MARGIN = 2, parallel = FALSE, mc.cores = 1L, return = c("same", "matrix", "data.frame"), drop = TRUE)

Arguments

  • X: a matrix, data frame or alike object.
  • FUN: a function, can be scalar- or vector-valued.
  • ...: further arguments to FUN.
  • MARGIN: integer. The margin which FUN will be applied over. Default 2 indicates columns while 1 indicates rows. See also Details.
  • parallel: logical. TRUE implements simple parallel execution by internally calling mclapply instead of lapply.
  • mc.cores: integer. Argument to mclapply indicating the number of cores to use for parallel execution. Can use detectCores() to select all available cores.
  • return: an integer or string indicating the type of object to return. The default 1 - "same" returns the same object type (i.e. class and other attributes are retained, just the names for the dimensions are adjusted). 2 - "matrix" always returns the output as matrix and 3 - "data.frame" always returns a data frame.
  • drop: logical. If the result has only one row or one column, drop = TRUE will drop dimensions and return a (named) atomic vector.

Details

dapply is an efficient command to apply functions to rows or columns of data without loosing information (attributes) about the data or changing the classes or format of the data. It is principally an efficient wrapper around lapply and works as follows:

  • Save the attributes of X.
  • If MARGIN = 2 (columns), convert matrices to plain lists of columns using mctl and remove all attributes from data frames.
  • If MARGIN = 1 (rows), convert matrices to plain lists of rows using mrtl. For data frames remove all attributes, efficiently convert to matrix using do.call(cbind, X) and also convert to list of rows using mrtl.
  • Call lapply or mclapply on these plain lists (which is faster than calling lapply on an object with attributes).
  • depending on the requested output type, use matrix, unlist or do.call(cbind, ...) to convert the result back to a matrix or list of columns.
  • modify the relevant attributes accordingly and efficiently attach to the object again (no further checks).

The performance gain from working with plain lists makes dapply not much slower than calling lapply itself on a data frame. Because of the conversions involved, row-operations require some memory, but are still faster than apply.

Returns

X where FUN was applied to every row or column.

See Also

BY, collap, Fast Statistical Functions , Data Transformations , Collapse Overview

Examples

head(dapply(mtcars, log)) # Take natural log of each variable head(dapply(mtcars, log, return = "matrix")) # Return as matrix m <- as.matrix(mtcars) head(dapply(m, log)) # Same thing head(dapply(m, log, return = "data.frame")) # Return data frame from matrix dapply(mtcars, sum); dapply(m, sum) # Computing sum of each column, return as vector dapply(mtcars, sum, drop = FALSE) # This returns a data frame of 1 row dapply(mtcars, sum, MARGIN = 1) # Compute row-sum of each column, return as vector dapply(m, sum, MARGIN = 1) # Same thing for matrices, faster t. apply(m, 1, sum) head(dapply(m, sum, MARGIN = 1, drop = FALSE)) # Gives matrix with one column head(dapply(m, quantile, MARGIN = 1)) # Compute row-quantiles dapply(m, quantile) # Column-quantiles head(dapply(mtcars, quantile, MARGIN = 1)) # Same for data frames, output is also a data.frame dapply(mtcars, quantile) # With classed objects, we have to be a bit careful ## Not run: dapply(EuStockMarkets, quantile) # This gives an error because the tsp attribute is misspecified ## End(Not run) dapply(EuStockMarkets, quantile, return = "matrix") # These both work fine.. dapply(EuStockMarkets, quantile, return = "data.frame") # Similarly for grouped tibbles and other data frame based classes library(dplyr) gmtcars <- group_by(mtcars,cyl,vs,am) head(dapply(gmtcars, log)) # Still gives a grouped tibble back dapply(gmtcars, quantile, MARGIN = 1) # Here it makes sense to keep the groups attribute dapply(gmtcars, quantile) # This does not make much sense, ... dapply(gmtcars, quantile, # better convert to plain data.frame: return = "data.frame")
  • Maintainer: Sebastian Krantz
  • License: GPL (>= 2) | file LICENSE
  • Last published: 2025-03-10