container: A Cosmos DB container object, as obtained by get_cosmos_container or create_cosmos_container.
query: A string containing the query text.
parameters: A named list of parameters to pass to a parameterised query, if required.
cross_partition, partition_key, by_pkrange: Arguments that control how to handle cross-partition queries. See 'Details' below.
as_data_frame: Whether to return the query result as a data frame, or a list of Cosmos DB document objects.
metadata: Whether to include Cosmos DB document metadata in the query result.
headers, ...: Optional arguments passed to lower-level functions.
Returns
query_documents returns the results of the query. Most of the time this will be a data frame, or list of data frames if by_pkrange=TRUE.
Details
This is the primary function for querying the contents of a Cosmos DB container (table). The query argument should contain the text of a SQL query, optionally parameterised. if the query contains parameters, pass them in the parameters argument as a named list.
Cosmos DB is a partitioned key-value store under the hood, with documents stored in separate physical databases according to their value of the partition key. The Cosmos DB REST API has limited support for cross-partition queries: basic SELECTs should work, but aggregates and more complex queries may require some hand-hacking.
The default cross_partition=TRUE runs the query for all partition key values and then attempts to stitch the results together. To run the query for only one key value, set cross_partition=FALSE and partition_key to the desired value. You can obtain all the values of the key with the list_partition_key_values function.
The by_pkrange argument allows running the query separately across all partition key ranges. Each partition key range corresponds to a separate physical partition, and contains the documents for one or more key values. You can set this to TRUE to run a query that fails when run across partitions; the returned object will be a list containing the individual query results from each pkrange.
As an alternative to AzureCosmosR, you can also use the ODBC protocol to interface with the SQL API. By installing a suitable ODBC driver, you can then talk to Cosmos DB in a manner similar to other SQL databases. An advantage of the ODBC interface is that it fully supports cross-partition queries, unlike the REST API. A disadvantage is that it does not support nested document fields; functions like array_contains() cannot be used, and attempts to reference arrays and objects may return incorrect results.
Examples
## Not run:endp <- cosmos_endpoint("https://myaccount.documents.azure.com:443/", key="mykey")# importing the Star Wars data from dplyrcont <- endp %>% get_cosmos_database(endp,"mydatabase")%>% create_cosmos_container(db,"mycontainer", partition_key="sex")bulk_import(cont, dplyr::starwars)query_documents(cont,"select * from mycontainer")# removing the Cosmos DB metadata cruftquery_documents(cont,"select * from mycontainer", metadata=FALSE)# a simple filterquery_documents(cont,"select * from mycontainer c where c.gender = 'masculine'")# run query for one partition key -- zero rows returnedquery_documents(cont,"select * from mycontainer c where c.gender = 'masculine'", partition_key="female")# aggregates will fail -- API does not fully support cross-partition queriestry(query_documents(cont,"select avg(c.height) avgheight from mycontainer c"))# Error in process_cosmos_response.response(response, simplify = as_data_frame) :# Bad Request (HTTP 400). Failed to complete Cosmos DB operation. Message:# ...# run query separately by pkrange and combine the results manuallyquery_documents( cont,"select avg(c.height) avgheight, count(1) n from mycontainer c", by_pkrange=TRUE)## End(Not run)