query_documents function

Query an Azure Cosmos DB container

Query an Azure Cosmos DB container

query_documents(container, ...) ## S3 method for class 'cosmos_container' query_documents( container, query, parameters = list(), cross_partition = TRUE, partition_key = NULL, by_pkrange = FALSE, as_data_frame = TRUE, metadata = TRUE, headers = list(), ... )

Arguments

  • container: A Cosmos DB container object, as obtained by get_cosmos_container or create_cosmos_container.
  • query: A string containing the query text.
  • parameters: A named list of parameters to pass to a parameterised query, if required.
  • cross_partition, partition_key, by_pkrange: Arguments that control how to handle cross-partition queries. See 'Details' below.
  • as_data_frame: Whether to return the query result as a data frame, or a list of Cosmos DB document objects.
  • metadata: Whether to include Cosmos DB document metadata in the query result.
  • headers, ...: Optional arguments passed to lower-level functions.

Returns

query_documents returns the results of the query. Most of the time this will be a data frame, or list of data frames if by_pkrange=TRUE.

Details

This is the primary function for querying the contents of a Cosmos DB container (table). The query argument should contain the text of a SQL query, optionally parameterised. if the query contains parameters, pass them in the parameters argument as a named list.

Cosmos DB is a partitioned key-value store under the hood, with documents stored in separate physical databases according to their value of the partition key. The Cosmos DB REST API has limited support for cross-partition queries: basic SELECTs should work, but aggregates and more complex queries may require some hand-hacking.

The default cross_partition=TRUE runs the query for all partition key values and then attempts to stitch the results together. To run the query for only one key value, set cross_partition=FALSE and partition_key to the desired value. You can obtain all the values of the key with the list_partition_key_values function.

The by_pkrange argument allows running the query separately across all partition key ranges. Each partition key range corresponds to a separate physical partition, and contains the documents for one or more key values. You can set this to TRUE to run a query that fails when run across partitions; the returned object will be a list containing the individual query results from each pkrange.

As an alternative to AzureCosmosR, you can also use the ODBC protocol to interface with the SQL API. By installing a suitable ODBC driver, you can then talk to Cosmos DB in a manner similar to other SQL databases. An advantage of the ODBC interface is that it fully supports cross-partition queries, unlike the REST API. A disadvantage is that it does not support nested document fields; functions like array_contains() cannot be used, and attempts to reference arrays and objects may return incorrect results.

Examples

## Not run: endp <- cosmos_endpoint("https://myaccount.documents.azure.com:443/", key="mykey") # importing the Star Wars data from dplyr cont <- endp %>% get_cosmos_database(endp, "mydatabase") %>% create_cosmos_container(db, "mycontainer", partition_key="sex") bulk_import(cont, dplyr::starwars) query_documents(cont, "select * from mycontainer") # removing the Cosmos DB metadata cruft query_documents(cont, "select * from mycontainer", metadata=FALSE) # a simple filter query_documents(cont, "select * from mycontainer c where c.gender = 'masculine'") # run query for one partition key -- zero rows returned query_documents(cont, "select * from mycontainer c where c.gender = 'masculine'", partition_key="female") # aggregates will fail -- API does not fully support cross-partition queries try(query_documents(cont, "select avg(c.height) avgheight from mycontainer c")) # Error in process_cosmos_response.response(response, simplify = as_data_frame) : # Bad Request (HTTP 400). Failed to complete Cosmos DB operation. Message: # ... # run query separately by pkrange and combine the results manually query_documents( cont, "select avg(c.height) avgheight, count(1) n from mycontainer c", by_pkrange=TRUE ) ## End(Not run)

See Also

cosmos_container , cosmos_document , list_partition_key_values , list_partition_key_ranges

  • Maintainer: Hong Ooi
  • License: MIT + file LICENSE
  • Last published: 2021-01-18

Downloads (last 30 days):