is.submodel() R function from [cna]

Identify correctness-preserving submodel relations

The function is.submodel checks for each element of a vector of cna solution formulas whether it is a submodel of a specified target model y. If y is the true model in an inverse search (i.e. the ground truth), is.submodel identifies correct models in the cna output.


is.submodel(x, y, strict = FALSE)
identical.model(x, y)

Arguments

x: Character vector of atomic and/or complex solution formulas (asf/csf). Must be of length 1 in identical.model.
y: Character string of length 1 specifying the target asf or csf.
strict: Logical; if TRUE, the elements of x only count as submodels of y if they are proper parts of y (i.e. not identical to y).

Details

To benchmark the reliability of a method of causal learning it must be tested to what degree the method recovers the true data generating structure $\Delta$ or proper substructures of $\Delta$ from data of varying quality. Reliability benchmarking is done in so-called inverse searches, which reverse the order of causal discovery as normally conducted in scientific practice. An inverse search comprises three steps: (1) a causal structure $\Delta$ is drawn/presupposed (as ground truth), (2) artificial data $\delta$ is simulated from $\Delta$ , possibly featuring various deficiencies (e.g. noise, fragmentation, etc.), and (3) $\delta$ is processed by the benchmarked method in order to check whether its output meets the tested reliability benchmark (e.g. whether the output is true of or identical to $\Delta$ ).

The main purpose of is.submodel is to execute step (3) of an inverse search that is tailor-made to test the reliability of cna [with randomConds and selectCases designed for steps (1) and (2), respectively]. A solution formula x being a submodel of a target formula y means that all the causal claims entailed by x are true of y, which is the case if a causal interpretation of x entails conjunctive and disjunctive causal relevance relations that are all likewise entailed by a causal interpretation of y. More specifically, x is a submodel of y if, and only if, the following conditions are satisfied: (i) all factor values causally relevant according to x are also causally relevant according to y, (ii) all factor values contained in two different disjuncts in x are also contained in two different disjuncts in y, (iii) all factor values contained in the same conjunct in x are also contained in the same conjunct in y, and (iv) if x is a csf with more than one asf, (i) to (iii) are satisfied for all asfs in x. For more details see Baumgartner and Thiem (2020).

If the target formula y is a csf, all solutions that is.submodel identifies as submodels of y make only causal claims that are true of y, but there may be more of these correctness-preserving solutions, which are not identified as such by is.submodel. See Baumgartner and Falk (2024) for details; see also the function causal_submodel in the list("frscore") package.

is.submodel requires two inputs: x and y. x is a character vector of cna solution formulas (asf or csf), and y is one asf or csf (i.e. a character string of length 1), viz. the target structure or ground truth. The function returns TRUE for elements of x that are submodels of y according to the definition provided in the previous paragraph. If strict = TRUE, x counts as a submodel of y only if x is a proper part of y (i.e. x is not identical to y).

The function identical.model returns TRUE only if x (which must be of length 1) and y are identical. It can be used to test whether y is completely recovered in an inverse search.

Returns

Logical vector of the same length as x.

References

Baumgartner, Michael and Alrik Thiem. 2020. Often Trusted But Never (Properly) Tested: Evaluating QualitativeComparative Analysis . Sociological Methods & Research 49:279-311.

Baumgartner, Michael and Christoph Falk. 2024. Quantifying the Quality of Configurational Causal Models , Journal of Causal Inference 60(1):20230032. doi: 10.1515/jci-2023-0032.

Examples


# Binary expressions
# ------------------
trueModel.1 <- "(A*b + a*B <-> C)*(C*d + c*D <-> E)"
candidates.1 <- c("(A + B <-> C)*(C + c*D <-> E)", "A + B <-> C", 
                 "(A <->  C)*(C <-> E)", "C <-> E")
candidates.2 <- c("(A*B + a*b <-> C)*(C*d + c*D <-> E)", "A*b*D + a*B <-> C", 
                 "(A*b + a*B <-> C)*(C*A*D <-> E)", "D <-> C", 
                 "(A*b + a*B + E <-> C)*(C*d + c*D <-> E)")

is.submodel(candidates.1, trueModel.1)
is.submodel(candidates.2, trueModel.1)
is.submodel(c(candidates.1, candidates.2), trueModel.1)

is.submodel("C + b*A <-> D", "A*b + C <-> D")
is.submodel("C + b*A <-> D", "A*b + C <-> D", strict = TRUE)
identical.model("C + b*A <-> D", "A*b + C <-> D")

target.1 <- "(A*b + a*B <-> C)*(C*d + c*D <-> E)"
testformula.1 <- "(A*b + a*B <-> C)*(C*d + c*D <-> E)*(A + B <-> C)"
is.submodel(testformula.1, target.1)

# Multi-value expressions
# -----------------------
trueModel.2 <- "(A=1*B=2 + B=3*A=2 <-> C=3)*(C=1 + D=3 <-> E=2)"
is.submodel("(A=1*B=2 + B=3 <-> C=3)*(D=3 <-> E=2)", trueModel.2)
is.submodel("(A=1*B=1 + B=3 <-> C=3)*(D=3 <-> E=2)", trueModel.2)
is.submodel(trueModel.2, trueModel.2)
is.submodel(trueModel.2, trueModel.2, strict = TRUE)

target.2 <- "C=2*D=1*B=3 + A=1 <-> E=5"
testformula.2 <- c("C=2 + D=1 <-> E=5","C=2 + D=1*B=3 <-> E=5","A=1+B=3*D=1*C=2 <-> E=5",
                "C=2 + D=1*B=3 + A=1 <-> E=5","C=2*B=3 + D=1 + B=3 + A=1 <-> E=5")
is.submodel(testformula.2, target.2)
identical.model(testformula.2[3], target.2)
identical.model(testformula.2[1], target.2)

cna package Read PDF manual

Maintainer: Mathias Ambuehl
License: GPL (>= 2)
Last published: 2025-04-04
https://CRAN.R-project.org/package=cna

is.submodel function