bq_table_download() R function from [bigrquery]

Download table data

This retrieves rows in chunks of page_size. It is most suitable for results of smaller queries (<100 MB, say). For larger queries, it is better to export the results to a CSV file stored on google cloud and use the bq command line tool to download locally.


bq_table_download(
  x,
  n_max = Inf,
  page_size = NULL,
  start_index = 0L,
  max_connections = 6L,
  quiet = NA,
  bigint = c("integer", "integer64", "numeric", "character"),
  max_results = deprecated()
)

Arguments

x: A bq_table
n_max: Maximum number of results to retrieve. Use Inf to retrieve all rows.
page_size: The number of rows requested per chunk. It is recommended to leave this unspecified until you have evidence that the page_size

selected automatically by bq_table_download() is problematic.

When page_size = NULL bigrquery determines a conservative, natural chunk size empirically. If you specify the page_size, it is important that each chunk fits on one page, i.e. that the requested row limit is low enough to prevent the API from paginating based on response size.
start_index: Starting row index (zero-based).
max_connections: Number of maximum simultaneous connections to BigQuery servers.
quiet: If FALSE, displays progress bar; if TRUE is silent; if NA picks based on whether or not you're in an interactive context.
bigint: The R type that BigQuery's 64-bit integer types should be mapped to. The default is "integer", which returns R's integer type, but results in NA for values above/below +/- 2147483647. "integer64"

returns a bit64::integer64 , which allows the full range of 64 bit integers.
max_results: Deprecated. Please use n_max instead.

Returns

Because data retrieval may generate list-columns and the data.frame

print method can have problems with list-columns, this method returns a tibble. If you need a data.frame, coerce the results with as.data.frame().

Complex data

bigrquery will retrieve nested and repeated columns in to list-columns as follows:

Repeated values (arrays) will become a list-column of vectors.
Records will become list-columns of named lists.
Repeated records will become list-columns of data frames.

Larger datasets

In my timings, this code takes around 1 minute per 100 MB of data. If you need to download considerably more than this, I recommend:

Export a .csv file to Cloud Storage using bq_table_save().
Use the gsutil command line utility to download it.
Read the csv file into R with readr::read_csv() or data.table::fread().

Unfortunately you can not export nested or repeated formats into CSV, and the formats that BigQuery supports (arvn and ndjson) that allow for nested/repeated values, are not well supported in R.

Google BigQuery API documentation

list

Examples


df <- bq_table_download("publicdata.samples.natality", n_max = 35000)

bigrquery package Read PDF manual

Maintainer: Hadley Wickham
License: MIT + file LICENSE
Last published: 2024-03-14

Useful links

bq_table_download function