Write an agent, informant, multiagent, or table scan to disk
Write an agent, informant, multiagent, or table scan to disk
Writing an agent, informant, multiagent, or even a table scan to disk with x_write_disk() can be useful for keeping data validation intel or table information close at hand for later retrieval (with x_read_disk()). By default, any data table that the agent or informant may have held before being committed to disk will be expunged (not applicable to any table scan since they never hold a table object). This behavior can be changed by setting keep_tbl to TRUE but this only works in the case where the table is not of the tbl_dbi or the tbl_spark class.
An agent object of class ptblank_agent, an informant of class ptblank_informant, or an table scan of class ptblank_tbl_scan.
filename: File name
scalar<character> // required
The filename to create on disk for the agent, informant, or table scan.
path: File path
scalar<character> // default:NULL (optional)
An optional path to which the file should be saved (this is automatically combined with filename).
keep_tbl: Keep data table inside object
scalar<logical> // default:FALSE
An option to keep a data table that is associated with the agent or informant (which is the case when the agent, for example, is created using create_agent(tbl = <data table, ...)). The default is FALSE where the data table is removed before writing to disk. For database tables of the class tbl_dbi and for Spark DataFrames (tbl_spark) the table is always removed (even if keep_tbl is set to TRUE).
keep_extracts: Keep data extracts inside object
scalar<logical> // default:FALSE
An option to keep any collected extract data for failing rows. Only applies to agent objects. By default, this is FALSE (i.e., extract data is removed).
quiet: Inform (or not) upon file writing
scalar<logical> // default:FALSE
Should the function not inform when the file is written?
Returns
Invisibly returns TRUE if the file has been written.
Details
It is recommended to set up a table-prep formula so that the agent and informant can access refreshed data after being read from disk through x_read_disk(). This can be done initially with the tbl argument of create_agent()/create_informant() by passing in a table-prep formula or a function that can obtain the target table when invoked. Alternatively, we can use the set_tbl() with a similarly crafted tbl expression to ensure that an agent or informant can retrieve a table at a later time.
Examples
A: Writing an agent to disk
Let's go through the process of (1) developing an agent with a validation plan (to be used for the data quality analysis of the small_table
dataset), (2) interrogating the agent with the interrogate() function, and (3) writing the agent and all its intel to a file.
Creating an action_levels object is a common workflow step when creating a pointblank agent. We designate failure thresholds to the warn, stop, and notify states using action_levels().
Now, let's create a pointblank agent object and give it the al object (which serves as a default for all validation steps which can be overridden). The data will be referenced in the tbl argument with a leading ~.
Then, as with any agent object, we can add steps to the validation plan by
using as many validation functions as we want. After that, use
interrogate().
We can read the file back as an agent with the x_read_disk() function and
we'll get all of the intel along with the restored agent.
If you're consistently writing agent reports when periodically checking data,
we could make use of the affix_date() or affix_datetime() depending on
the granularity you need. Here's an example that writes the file with the
format: "<filename>-YYYY-mm-dd_HH-MM-SS.rds".
Let's go through the process of (1) creating an informant object that minimally describes the small_table dataset, (2) ensuring that data is captured from the target table using the incorporate() function, and (3) writing the informant to a file.
Create a pointblank informant object with create_informant() and the small_table dataset. Use incorporate() so that info snippets are integrated into the text.
informant <-
create_informant(
tbl = ~ small_table,
tbl_name = "small_table",
label = "`x_write_disk()`"
) %>%
info_snippet(
snippet_name = "high_a",
fn = snip_highest(column = "a")
) %>%
info_snippet(
snippet_name = "low_a",
fn = snip_lowest(column = "a")
) %>%
info_columns(
columns = a,
info = "From {low_a} to {high_a}."
) %>%
info_columns(
columns = starts_with("date"),
info = "Time-based values."
) %>%
info_columns(
columns = date,
info = "The date part of `date_time`."
) %>%
incorporate()
The informant can be written to a file with x_write_disk(). Let's do this with affix_date() so that the filename has a datestamp.
We can read the file back as a multiagent with the x_read_disk() function
and we'll get all of the constituent agents and their associated intel back
as well.
D: Writing a table scan to disk
We can get a report that describes all of the data in the storms dataset.
tbl_scan <- scan_data(tbl = dplyr::storms)
The table scan object can be written to a file with x_write_disk().