With yaml_write() we can take different pointblank objects (these are the ptblank_agent, ptblank_informant, and tbl_store) and write them to YAML. With an agent, for example, yaml_write() will write that everything that is needed to specify an agent and it's validation plan to a YAML file. With YAML, we can modify the YAML markup if so desired, or, use as is to create a new agent with the yaml_read_agent() function. That agent will have a validation plan and is ready to interrogate() the data. We can go a step further and perform an interrogation directly from the YAML file with the yaml_agent_interrogate() function. That returns an agent with intel (having already interrogated the target data table). An informant object can also be written to YAML with yaml_write().
One requirement for writing an agent or an informant to YAML is that we need to have a table-prep formula specified (it's an R formula that is used to read the target table when interrogate() or incorporate() is called). This option can be set when using create_agent()/create_informant() or with set_tbl() (useful with an existing agent or informant object).
<series of obj:<ptblank_agent|ptblank_informant|tbl_store>>
// required
Any mix of pointblank objects such as the agent
(ptblank_agent), the informant (ptblank_informant), or the table store (tbl_store). The agent and informant can be combined into a single YAML file (so long as both objects refer to the same table). A table store cannot be combined with either an agent or an informant so it must undergo conversion alone.
.list: Alternative to ...
<list of multiple expressions> // required (or, use ...)
Allows for the use of a list as an input alternative to ....
filename: File name
scalar<character> // default:NULL (optional)
The name of the YAML file to create on disk. It is recommended that either the .yaml or .yml extension be used for this file. If not provided then default names will be used ("tbl_store.yml") for a table store and the other objects will get default naming to the effect of "<object>-<tbl_name>.yml".
path: File path
scalar<character> // default:NULL (optional)
An optional path to which the YAML file should be saved (combined with filename).
expanded: Expand validation when repeating across multiple columns
scalar<logical> // default:FALSE
Should the written validation expressions for an agent be expanded such that tidyselect expressions for columns are evaluated, yielding a validation function per column? By default, this is FALSE so expressions as written will be retained in the YAML representation.
quiet: Inform (or not) upon file writing
scalar<logical> // default:FALSE
. Should the function not inform when the file is written?
Returns
Invisibly returns TRUE if the YAML file has been written.
Examples
Writing an agent object to a YAML file
Let's go through the process of developing an agent with a validation plan. We'll use the small_table dataset in the following examples, which will eventually offload the developed validation plan to a YAML file.
small_table
#> # A tibble: 13 x 8
#> date_time date a b c d e f
#> <dttm> <date> <int> <chr> <dbl> <dbl> <lgl> <chr>
#> 1 2016-01-04 11:00:00 2016-01-04 2 1-bcd-345 3 3423. TRUE high
#> 2 2016-01-04 00:32:00 2016-01-04 3 5-egh-163 8 10000. TRUE low
#> 3 2016-01-05 13:32:00 2016-01-05 6 8-kdg-938 3 2343. TRUE high
#> 4 2016-01-06 17:23:00 2016-01-06 2 5-jdo-903 NA 3892. FALSE mid
#> 5 2016-01-09 12:36:00 2016-01-09 8 3-ldm-038 7 284. TRUE low
#> 6 2016-01-11 06:15:00 2016-01-11 4 2-dhe-923 4 3291. TRUE mid
#> 7 2016-01-15 18:46:00 2016-01-15 7 1-knw-093 3 843. TRUE high
#> 8 2016-01-17 11:27:00 2016-01-17 4 5-boe-639 2 1036. FALSE low
#> 9 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 838. FALSE high
#> 10 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 838. FALSE high
#> 11 2016-01-26 20:07:00 2016-01-26 4 2-dmx-010 7 834. TRUE low
#> 12 2016-01-28 02:51:00 2016-01-28 2 7-dmx-010 8 108. FALSE low
#> 13 2016-01-30 11:23:00 2016-01-30 1 3-dka-303 NA 2230. TRUE high
Creating an action_levels object is a common workflow step when creating a pointblank agent. We designate failure thresholds to the warn, stop, and notify states using action_levels().
Now let's create the agent and pass it the al object (which serves as a
default for all validation steps which can be overridden). The data will be
referenced in tbl with a leading ~ and this is a requirement for writing to YAML since the preparation of the target table must be self contained.
agent <-
create_agent(
tbl = ~ small_table,
tbl_name = "small_table",
label = "A simple example with the `small_table`.",
actions = al
)
Then, as with any agent object, we can add steps to the validation plan by
using as many validation functions as we want.
The agent can be written to a pointblank -readable YAML file with the
yaml_write() function. Here, we'll use the filename
"agent-small_table.yml" and, after writing, the YAML file will be in the
working directory:
type: agent
tbl: ~small_table
tbl_name: small_table
label: A simple example with the `small_table`.
lang: en
locale: en
actions:
warn_fraction: 0.1
stop_fraction: 0.25
notify_fraction: 0.35
steps:
- col_exists:
columns: c(date, date_time)
- col_vals_regex:
columns: c(b)
regex: '[0-9]-[a-z]{3}-[0-9]{3}'
- rows_distinct:
columns: ~
- col_vals_gt:
columns: c(d)
value: 100.0
- col_vals_lte:
columns: c(c)
value: 5.0
Incidentally, we can also use yaml_agent_string() to print YAML in the
console when supplying an agent as the input. This can be useful for
previewing YAML output just before writing it to disk with yaml_write().
Reading an agent object from a YAML file
There's a YAML file available in the pointblank package that's also called "agent-small_table.yml". The path for it can be accessed through system.file():
This particular agent is using ~ tbl_source("small_table", "tbl_store.yml")
to source the table-prep from a YAML file that holds a table store (can be
seen using yaml_agent_string(agent = agent)). Let's put that file in the
working directory (the pointblank package has the corresponding YAML file):
As can be seen from the validation report, no interrogation was yet
performed. Saving an agent to YAML will remove any traces of interrogation
data and serve as a plan for a new interrogation on the same target table. We
can either follow this up with with interrogate() and get an agent with
intel, or, we can interrogate directly from the YAML file with
yaml_agent_interrogate():
Let's walk through how we can generate some useful information for a really small table. We can create an informant object with create_informant()
and we'll again use the small_table dataset.
informant <-
create_informant(
tbl = ~ small_table,
tbl_name = "small_table",
label = "A simple example with the `small_table`."
)
Then, as with any informant object, we can add info text to the using as many info_*() functions as we want.
informant <-
informant %>%
info_columns(
columns = a,
info = "In the range of 1 to 10. (SIMPLE)"
) %>%
info_columns(
columns = starts_with("date"),
info = "Time-based values (e.g., `Sys.time()`)."
) %>%
info_columns(
columns = date,
info = "The date part of `date_time`. (CALC)"
)
The informant can be written to a pointblank -readable YAML file with the
yaml_write() function. Here, we'll use the filename
"informant-small_table.yml" and, after writing, the YAML file will be in
the working directory:
We can inspect the YAML file in the working directory and expect to see the
following:
type: informant
tbl: ~small_table
tbl_name: small_table
info_label: A simple example with the `small_table`.
lang: en
locale: en
table:
name: small_table
_columns: 8
_rows: 13.0
_type: tbl_df
columns:
date_time:
_type: POSIXct, POSIXt
info: Time-based values (e.g., `Sys.time()`).
date:
_type: Date
info: Time-based values (e.g., `Sys.time()`). The date part of `date_time`.
a:
_type: integer
info: In the range of 1 to 10. (SIMPLE)
b:
_type: character
c:
_type: numeric
d:
_type: numeric
e:
_type: logical
f:
_type: character
Reading an informant object from a YAML file
There's a YAML file available in the pointblank package that's also called "informant-small_table.yml". The path for it can be accessed through system.file():
As can be seen from the information report, the available table metadata was
restored and reported. If you expect metadata to change with time, it might
be beneficial to use incorporate() to query the target table. Or, we can
perform this querying directly from the YAML file with
yaml_informant_incorporate():
There will be no apparent difference in this particular case since
small_data is a static table with no alterations over time. However,
using yaml_informant_incorporate() is good practice since this refreshing
of data will be important with real-world datasets.
Function ID
11-1
See Also
Other pointblank YAML: yaml_agent_interrogate(), yaml_agent_show_exprs(), yaml_agent_string(), yaml_exec(), yaml_informant_incorporate(), yaml_read_agent(), yaml_read_informant()