Calculated distribution of a query from a prior or posterior distribution of parameters
query_distribution( model, queries =NULL, given =NULL, using ="parameters", parameters =NULL, n_draws =4000, join_by ="|", case_level =FALSE, query =NULL)
Arguments
model: A causal_model. A model object generated by make_model.
queries: A vector of strings or list of strings specifying queries on potential outcomes such as "Y[X=1] - Y[X=0]". Queries can also indicate conditioning sets by placing second queries after a colon: "Y[X=1] - Y[X=0] :|: X == 1 & Y == 1". Note a ':|:' is used rather than the traditional conditioning marker '|' to avoid confusion with logical operators.
given: A character vector specifying given conditions for each query. A 'given' is a quoted expression that evaluates to logical statement. given allows the query to be conditioned on either observed or counterfactural distributions. A value of TRUE is interpreted as no conditioning. A given statement can alternatively be provided after a colon in the query statement.
using: A character. Whether to use priors, posteriors or parameters
parameters: A vector or list of vectors of real numbers in [0,1]. A true parameter vector to be used instead of parameters attached to the model in case using specifies parameters
n_draws: An integer. Number of draws.rm
join_by: A character. The logical operator joining expanded types when query contains wildcard (.). Can take values "&" (logical AND) or "|" (logical OR). When restriction contains wildcard (.) and join_by is not specified, it defaults to "|", otherwise it defaults to NULL.
case_level: Logical. If TRUE estimates the probability of the query for a case.
query: alias for queries
Returns
A data frame where columns contain draws from the distribution of the potential outcomes specified in query
Examples
model <- make_model("X -> Y")|> set_parameters(c(.5,.5,.1,.2,.3,.4))# simple queries query_distribution(model, query ="(Y[X=1] > Y[X=0])", using ="priors")|> head()# multiple queries query_distribution(model, query = list(PE ="(Y[X=1] > Y[X=0])", NE ="(Y[X=1] < Y[X=0])"), using ="priors")|> head()# multiple queries and givens, with ':' to identify conditioning distributions query_distribution(model, query = list(POC ="(Y[X=1] > Y[X=0]) :|: X == 1 & Y == 1", Q ="(Y[X=1] < Y[X=0]) :|: (Y[X=1] <= Y[X=0])"), using ="priors")|> head()# multiple queries and givens, using 'given' argument query_distribution(model, query = list("(Y[X=1] > Y[X=0])","(Y[X=1] < Y[X=0])"), given = list("Y==1","(Y[X=1] <= Y[X=0])"), using ="priors")|> head()# linear queries query_distribution(model, query ="(Y[X=1] - Y[X=0])")# Linear query conditional on potential outcomes query_distribution(model, query ="(Y[X=1] - Y[X=0]) :|: Y[X=1]==0")# Use join_by to amend query interpretation query_distribution(model, query ="(Y[X=.] == 1)", join_by ="&")# Probability of causation query query_distribution(model, query ="(Y[X=1] > Y[X=0])", given ="X==1 & Y==1", using ="priors")|> head()# Case level probability of causation query query_distribution(model, query ="(Y[X=1] > Y[X=0])", given ="X==1 & Y==1", case_level =TRUE, using ="priors")# Query posterior update_model(model, make_data(model, n =3))|> query_distribution(query ="(Y[X=1] - Y[X=0])", using ="posteriors")|> head()# Case level queries provide the inference for a case, which is a scalar# The case level query *updates* on the given information# For instance, here we have a model for which we are quite sure that X# causes Y but we do not know whether it works through two positive effects# or two negative effects. Thus we do not know if M=0 would suggest an# effect or no effect set.seed(1) model <- make_model("X -> M -> Y")|> update_model(data.frame(X = rep(0:1,8), Y = rep(0:1,8)), iter =10000) Q <-"Y[X=1] > Y[X=0]" G <-"X==1 & Y==1 & M==1" QG <-"(Y[X=1] > Y[X=0]) & (X==1 & Y==1 & M==1)"# In this case these are very different: query_distribution(model, Q, given = G, using ="posteriors")[[1]]|> mean() query_distribution(model, Q, given = G, using ="posteriors", case_level =TRUE)# These are equivalent:# 1. Case level query via function query_distribution(model, Q, given = G, using ="posteriors", case_level =TRUE)# 2. Case level query by hand using Bayes' rule query_distribution( model, list(QG = QG, G = G), using ="posteriors")|> dplyr::summarize(mean(QG)/mean(G))