Various helpers to simulate data and to manipulate data types between compact and long forms.
collapse_data can be used to convert long form data to compact form data,
expand_data can be used to convert compact form data (one row per data type) to long form data (one row per observation).
make_data generates a dataset with one row per observation.
make_events generates a dataset with one row for each data type. Draws full data only. To generate various types of incomplete data see make_data.
collapse_data( data, model, drop_NA =TRUE, drop_family =FALSE, summary =FALSE)expand_data(data_events =NULL, model)make_data( model, n =NULL, parameters =NULL, param_type =NULL, nodes =NULL, n_steps =NULL, probs =NULL, subsets =TRUE, complete_data =NULL, given =NULL, verbose =FALSE,...)make_events( model, n =1, w =NULL, P =NULL, A =NULL, parameters =NULL, param_type =NULL, include_strategy =FALSE,...)
Arguments
data: A data.frame. Data of nodes that can take three values: 0, 1, and NA. In long form as generated by make_events
model: A causal_model. A model object generated by make_model.
drop_NA: Logical. Whether to exclude strategy families that contain no observed data. Exceptionally if no data is provided, minimal data on data on first node is returned. Defaults to TRUE
drop_family: Logical. Whether to remove column strategy from the output. Defaults to FALSE.
summary: Logical. Whether to return summary of the data. See details. Defaults to FALSE.
data_events: A 'compact' data.frame with one row per data type. Must be compatible with nodes in model. The default columns are event, strategy and count.
n: An integer. Number of observations.
parameters: A vector of real numbers in [0,1]. Values of parameters to specify (optional). By default, parameters is drawn from the parameters dataframe. See inspect(model, "parameters_df").
param_type: A character. String specifying type of parameters to make 'flat', 'prior_mean', 'posterior_mean', 'prior_draw', 'posterior_draw', 'define. With param_type set to define use arguments to be passed to make_priors; otherwise flat sets equal probabilities on each nodal type in each parameter set; prior_mean, prior_draw, posterior_mean, posterior_draw take parameters as the means or as draws from the prior or posterior.
nodes: A list. Which nodes to be observed at each step. If NULL all nodes are observed.
n_steps: A list. Number of observations to be observed at each step
probs: A list. Observation probabilities at each step
subsets: A list. Strata within which observations are to be observed at each step. TRUE for all, otherwise an expression that evaluates to a logical condition.
complete_data: A data.frame. Dataset with complete observations. Optional.
given: A string specifying known values on nodes, e.g. "X==1 & Y==1"
verbose: Logical. If TRUE prints step schedule.
...: Arguments to be passed to make_priors if param_type == define
w: A numeric matrix. A n_parameters x 1 matrix of event probabilities with named rows.
P: A data.frame. Parameter matrix. Not required but may be provided to avoid repeated computation for simulations. See inspect(model, "parameter_matrix").
A: A data.frame. Ambiguities matrix. Not required but may be provided to avoid repeated computation for simulations. inspect(model, "ambiguities_matrix")
include_strategy: Logical. Whether to include a 'strategy' vector. Defaults to FALSE. Strategy vector does not vary with full data but expected by some functions.
Returns
A vector of data events
If summary = TRUEcollapse_data returns a list containing the following components: - data_events: A compact data.frame of event types and strategies.
observed_events: A vector of character strings specifying the events observed in the data
unobserved_events: A vector of character strings specifying the events not observed in the data
A data.frame with rows as data observation
A data.frame with simulated data.
A data.frame of events
Details
Note that default behavior is not to take account of whether a node has already been observed when determining whether to select or not. One can however specifically request observation of nodes that have not been previously observed.
Examples
model <- make_model('X -> Y')df <- data.frame(X = c(0,1,NA), Y = c(0,0,1))df |> collapse_data(model)# Illustrating optionsdf |> collapse_data(model, drop_NA =FALSE)df |> collapse_data(model, drop_family =TRUE)df |> collapse_data(model, summary =TRUE)# Appropriate behavior given restricted modelsmodel <- make_model('X -> Y')|> set_restrictions('X[]==1')df <- make_data(model, n =10)df[1,1]<-''df |> collapse_data(model)df <- data.frame(X =0:1)df |> collapse_data(model)model <- make_model('X->M->Y')make_events(model, n =5)|> expand_data(model)make_events(model, n =0)|> expand_data(model)# Simple drawsmodel <- make_model("X -> M -> Y")make_data(model)make_data(model, n =3, nodes = c("X","Y"))make_data(model, n =3, param_type ="prior_draw")make_data(model, n =10, param_type ="define", parameters =0:9)# Data Strategies# A strategy in which X, Y are observed for sure and M is observed# with 50% probability for X=1, Y=0 casesmodel <- make_model("X -> M -> Y")make_data( model, n =8, nodes = list(c("X","Y"),"M"), probs = list(1,.5), subsets = list(TRUE,"X==1 & Y==0"))# n not provided but inferred from largest n_step (not from sum of n_steps)make_data( model, nodes = list(c("X","Y"),"M"), n_steps = list(5,2))# Wide then deep make_data( model, n =8, nodes = list(c("X","Y"),"M"), subsets = list(TRUE,"!is.na(X) & !is.na(Y)"), n_steps = list(6,2))make_data( model, n =8, nodes = list(c("X","Y"), c("X","M")), subsets = list(TRUE,"is.na(X)"), n_steps = list(3,2))# Example with probabilities at each stepmake_data( model, n =8, nodes = list(c("X","Y"), c("X","M")), subsets = list(TRUE,"is.na(X)"), probs = list(.5,.2))# Example with given datamake_data(model, given ="X==1 & Y==1", n =5)model <- make_model('X -> Y')make_events(model = model)make_events(model = model, param_type ='prior_draw')make_events(model = model, include_strategy =TRUE)
See Also
Other data_generation: get_all_data_types(), make_data_single(), observe_data()
Other data_generation: get_all_data_types(), make_data_single(), observe_data()