data_helpers function

Data helpers

Data helpers

Various helpers to simulate data and to manipulate data types between compact and long forms.

collapse_data can be used to convert long form data to compact form data,

expand_data can be used to convert compact form data (one row per data type) to long form data (one row per observation).

make_data generates a dataset with one row per observation.

make_events generates a dataset with one row for each data type. Draws full data only. To generate various types of incomplete data see make_data.

collapse_data( data, model, drop_NA = TRUE, drop_family = FALSE, summary = FALSE ) expand_data(data_events = NULL, model) make_data( model, n = NULL, parameters = NULL, param_type = NULL, nodes = NULL, n_steps = NULL, probs = NULL, subsets = TRUE, complete_data = NULL, given = NULL, verbose = FALSE, ... ) make_events( model, n = 1, w = NULL, P = NULL, A = NULL, parameters = NULL, param_type = NULL, include_strategy = FALSE, ... )

Arguments

  • data: A data.frame. Data of nodes that can take three values: 0, 1, and NA. In long form as generated by make_events
  • model: A causal_model. A model object generated by make_model.
  • drop_NA: Logical. Whether to exclude strategy families that contain no observed data. Exceptionally if no data is provided, minimal data on data on first node is returned. Defaults to TRUE
  • drop_family: Logical. Whether to remove column strategy from the output. Defaults to FALSE.
  • summary: Logical. Whether to return summary of the data. See details. Defaults to FALSE.
  • data_events: A 'compact' data.frame with one row per data type. Must be compatible with nodes in model. The default columns are event, strategy and count.
  • n: An integer. Number of observations.
  • parameters: A vector of real numbers in [0,1]. Values of parameters to specify (optional). By default, parameters is drawn from the parameters dataframe. See inspect(model, "parameters_df").
  • param_type: A character. String specifying type of parameters to make 'flat', 'prior_mean', 'posterior_mean', 'prior_draw', 'posterior_draw', 'define. With param_type set to define use arguments to be passed to make_priors; otherwise flat sets equal probabilities on each nodal type in each parameter set; prior_mean, prior_draw, posterior_mean, posterior_draw take parameters as the means or as draws from the prior or posterior.
  • nodes: A list. Which nodes to be observed at each step. If NULL all nodes are observed.
  • n_steps: A list. Number of observations to be observed at each step
  • probs: A list. Observation probabilities at each step
  • subsets: A list. Strata within which observations are to be observed at each step. TRUE for all, otherwise an expression that evaluates to a logical condition.
  • complete_data: A data.frame. Dataset with complete observations. Optional.
  • given: A string specifying known values on nodes, e.g. "X==1 & Y==1"
  • verbose: Logical. If TRUE prints step schedule.
  • ...: Arguments to be passed to make_priors if param_type == define
  • w: A numeric matrix. A n_parameters x 1 matrix of event probabilities with named rows.
  • P: A data.frame. Parameter matrix. Not required but may be provided to avoid repeated computation for simulations. See inspect(model, "parameter_matrix").
  • A: A data.frame. Ambiguities matrix. Not required but may be provided to avoid repeated computation for simulations. inspect(model, "ambiguities_matrix")
  • include_strategy: Logical. Whether to include a 'strategy' vector. Defaults to FALSE. Strategy vector does not vary with full data but expected by some functions.

Returns

A vector of data events

If summary = TRUE collapse_data returns a list containing the following components: - data_events: A compact data.frame of event types and strategies.

  • observed_events: A vector of character strings specifying the events observed in the data

  • unobserved_events: A vector of character strings specifying the events not observed in the data

A data.frame with rows as data observation

A data.frame with simulated data.

A data.frame of events

Details

Note that default behavior is not to take account of whether a node has already been observed when determining whether to select or not. One can however specifically request observation of nodes that have not been previously observed.

Examples

model <- make_model('X -> Y') df <- data.frame(X = c(0,1,NA), Y = c(0,0,1)) df |> collapse_data(model) # Illustrating options df |> collapse_data(model, drop_NA = FALSE) df |> collapse_data(model, drop_family = TRUE) df |> collapse_data(model, summary = TRUE) # Appropriate behavior given restricted models model <- make_model('X -> Y') |> set_restrictions('X[]==1') df <- make_data(model, n = 10) df[1,1] <- '' df |> collapse_data(model) df <- data.frame(X = 0:1) df |> collapse_data(model) model <- make_model('X->M->Y') make_events(model, n = 5) |> expand_data(model) make_events(model, n = 0) |> expand_data(model) # Simple draws model <- make_model("X -> M -> Y") make_data(model) make_data(model, n = 3, nodes = c("X","Y")) make_data(model, n = 3, param_type = "prior_draw") make_data(model, n = 10, param_type = "define", parameters = 0:9) # Data Strategies # A strategy in which X, Y are observed for sure and M is observed # with 50% probability for X=1, Y=0 cases model <- make_model("X -> M -> Y") make_data( model, n = 8, nodes = list(c("X", "Y"), "M"), probs = list(1, .5), subsets = list(TRUE, "X==1 & Y==0")) # n not provided but inferred from largest n_step (not from sum of n_steps) make_data( model, nodes = list(c("X", "Y"), "M"), n_steps = list(5, 2)) # Wide then deep make_data( model, n = 8, nodes = list(c("X", "Y"), "M"), subsets = list(TRUE, "!is.na(X) & !is.na(Y)"), n_steps = list(6, 2)) make_data( model, n = 8, nodes = list(c("X", "Y"), c("X", "M")), subsets = list(TRUE, "is.na(X)"), n_steps = list(3, 2)) # Example with probabilities at each step make_data( model, n = 8, nodes = list(c("X", "Y"), c("X", "M")), subsets = list(TRUE, "is.na(X)"), probs = list(.5, .2)) # Example with given data make_data(model, given = "X==1 & Y==1", n = 5) model <- make_model('X -> Y') make_events(model = model) make_events(model = model, param_type = 'prior_draw') make_events(model = model, include_strategy = TRUE)

See Also

Other data_generation: get_all_data_types(), make_data_single(), observe_data()

Other data_generation: get_all_data_types(), make_data_single(), observe_data()