api-perform function

BigQuery jobs: perform a job

BigQuery jobs: perform a job

These functions are low-level functions designed to be used by experts. Each of these low-level functions is paired with a high-level function that you should use instead:

  • bq_perform_copy(): bq_table_copy().
  • bq_perform_query(): bq_dataset_query(), bq_project_query().
  • bq_perform_upload(): bq_table_upload().
  • bq_perform_load(): bq_table_load().
  • bq_perform_extract(): bq_table_save().
bq_perform_extract( x, destination_uris, destination_format = "NEWLINE_DELIMITED_JSON", compression = "NONE", ..., print_header = TRUE, billing = x$project ) bq_perform_upload( x, values, fields = NULL, create_disposition = "CREATE_IF_NEEDED", write_disposition = "WRITE_EMPTY", ..., billing = x$project ) bq_perform_load( x, source_uris, billing = x$project, source_format = "NEWLINE_DELIMITED_JSON", fields = NULL, nskip = 0, create_disposition = "CREATE_IF_NEEDED", write_disposition = "WRITE_EMPTY", ... ) bq_perform_query( query, billing, ..., parameters = NULL, destination_table = NULL, default_dataset = NULL, create_disposition = "CREATE_IF_NEEDED", write_disposition = "WRITE_EMPTY", use_legacy_sql = FALSE, priority = "INTERACTIVE" ) bq_perform_query_dry_run( query, billing, ..., default_dataset = NULL, parameters = NULL, use_legacy_sql = FALSE ) bq_perform_copy( src, dest, create_disposition = "CREATE_IF_NEEDED", write_disposition = "WRITE_EMPTY", ..., billing = NULL )

Arguments

  • x: A bq_table

  • destination_uris: A character vector of fully-qualified Google Cloud Storage URIs where the extracted table should be written. Can export up to 1 Gb of data per file. Use a wild card URI (e.g. gs://[YOUR_BUCKET]/file-name-*.json) to automatically create any number of files.

  • destination_format: The exported file format. Possible values include "CSV", "NEWLINE_DELIMITED_JSON" and "AVRO". Tables with nested or repeated fields cannot be exported as CSV.

  • compression: The compression type to use for exported files. Possible values include "GZIP", "DEFLATE", "SNAPPY", and "NONE". "DEFLATE" and "SNAPPY" are only supported for Avro.

  • ...: Additional arguments passed on to the underlying API call. snake_case names are automatically converted to camelCase.

  • print_header: Whether to print out a header row in the results.

  • billing: Identifier of project to bill.

  • values: Data frame of values to insert.

  • fields: A bq_fields specification, or something coercible to it (like a data frame). Leave as NULL to allow BigQuery to auto-detect the fields.

  • create_disposition: Specifies whether the job is allowed to create new tables.

    The following values are supported:

    • "CREATE_IF_NEEDED": If the table does not exist, BigQuery creates the table.
    • "CREATE_NEVER": The table must already exist. If it does not, a 'notFound' error is returned in the job result.
  • write_disposition: Specifies the action that occurs if the destination table already exists. The following values are supported:

    • "WRITE_TRUNCATE": If the table already exists, BigQuery overwrites the table data.
    • "WRITE_APPEND": If the table already exists, BigQuery appends the data to the table.
    • "WRITE_EMPTY": If the table already exists and contains data, a 'duplicate' error is returned in the job result.
  • source_uris: The fully-qualified URIs that point to your data in Google Cloud.

    For Google Cloud Storage URIs: Each URI can contain one `'*'`` wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources.

    For Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups: Exactly one URI can be specified. Also, the '*' wildcard character is not allowed.

  • source_format: The format of the data files:

    • For CSV files, specify "CSV".
    • For datastore backups, specify "DATASTORE_BACKUP".
    • For newline-delimited JSON, specify "NEWLINE_DELIMITED_JSON".
    • For Avro, specify "AVRO".
    • For parquet, specify "PARQUET".
    • For orc, specify "ORC".
  • nskip: For source_format = "CSV", the number of header rows to skip.

  • query: SQL query string.

  • parameters: Named list of parameters match to query parameters. Parameter x will be matched to placeholder @x.

    Generally, you can supply R vectors and they will be automatically converted to the correct type. If you need greater control, you can call bq_param_scalar() or bq_param_array() explicitly.

    See https://cloud.google.com/bigquery/docs/parameterized-queries

    for more details.

  • destination_table: A bq_table where results should be stored. If not supplied, results will be saved to a temporary table that lives in a special dataset. You must supply this parameter for large queries (> 128 MB compressed).

  • default_dataset: A bq_dataset used to automatically qualify table names.

  • use_legacy_sql: If TRUE will use BigQuery's legacy SQL format.

  • priority: Specifies a priority for the query. Possible values include "INTERACTIVE" and "BATCH". Batch queries do not start immediately, but are not rate-limited in the same way as interactive queries.

Returns

A bq_job .

Google BigQuery API documentation

Additional information at:

Examples

ds <- bq_test_dataset() bq_mtcars <- bq_table(ds, "mtcars") job <- bq_perform_upload(bq_mtcars, mtcars) bq_table_exists(bq_mtcars) bq_job_wait(job) bq_table_exists(bq_mtcars) head(bq_table_download(bq_mtcars))
  • Maintainer: Hadley Wickham
  • License: MIT + file LICENSE
  • Last published: 2024-03-14