add_chunk function

Add a chunk field to a data frame

Add a chunk field to a data frame

This auxiliary function adds a field, if necessary, to a data frame so that each compartment of the data frame that corresponds to a unique combination of the chunk fields has a size below a certain threshold. This resulting data frame can then be safely used in dbAppendTable() becauase Presto has a size limit on any discrete INSERT INTO statement.

add_chunk( value, base_chunk_fields = NULL, chunk_size = 1e+06, new_chunk_field_name = "aux_chunk_idx" )

Arguments

  • value: The original data frame.
  • base_chunk_fields: A character vector of existing field names that are used to split the data frame before checking the chunk size.
  • chunk_size: Maximum size (in bytes) of the VALUES statement encoding each unique chunk. Default to 1,000,000 bytes (i.e. 1Mb).
  • new_chunk_field_name: A string indicating the new chunk field name. Default to "aux_chunk_idx".

Examples

## Not run: # returns the original data frame because it's within size add_chunk(iris) # add a new aux_chunk_idx field add_chunk(iris, chunk_size = 2000) # the new aux_chunk_idx field is added on top of Species add_chunk(iris, chunk_size = 2000, base_chunk_fields = c("Species")) ## End(Not run)
  • Maintainer: Jarod G.R. Meng
  • License: BSD_3_clause + file LICENSE
  • Last published: 2025-01-08