The function is.submodel checks for each element of a vector of cna solution formulas whether it is a submodel of a specified target model y. If y is the true model in an inverse search (i.e. the ground truth), is.submodel identifies correct models in the cna output.
is.submodel(x, y, strict =FALSE)identical.model(x, y)
Arguments
x: Character vector of atomic and/or complex solution formulas (asf/csf). Must be of length 1 in identical.model.
y: Character string of length 1 specifying the target asf or csf.
strict: Logical; if TRUE, the elements of x only count as submodels of y if they are proper parts of y (i.e. not identical to y).
Details
To benchmark the reliability of a method of causal learning it must be tested to what degree the method recovers the true data generating structure Δ or proper substructures of Δ from data of varying quality. Reliability benchmarking is done in so-called inverse searches, which reverse the order of causal discovery as normally conducted in scientific practice. An inverse search comprises three steps: (1) a causal structure Δ is drawn/presupposed (as ground truth), (2) artificial data δ is simulated from Δ, possibly featuring various deficiencies (e.g. noise, fragmentation, etc.), and (3) δ is processed by the benchmarked method in order to check whether its output meets the tested reliability benchmark (e.g. whether the output is true of or identical to Δ).
The main purpose of is.submodel is to execute step (3) of an inverse search that is tailor-made to test the reliability of cna [with randomConds and selectCases designed for steps (1) and (2), respectively]. A solution formula x being a submodel of a target formula y means that all the causal claims entailed by x are true of y, which is the case if a causal interpretation of x entails conjunctive and disjunctive causal relevance relations that are all likewise entailed by a causal interpretation of y. More specifically, x is a submodel of y if, and only if, the following conditions are satisfied: (i) all factor values causally relevant according to x are also causally relevant according to y, (ii) all factor values contained in two different disjuncts in x are also contained in two different disjuncts in y, (iii) all factor values contained in the same conjunct in x are also contained in the same conjunct in y, and (iv) if x is a csf with more than one asf, (i) to (iii) are satisfied for all asfs in x. For more details see Baumgartner and Thiem (2020).
If the target formula y is a csf, all solutions that is.submodel identifies as submodels of y make only causal claims that are true of y, but there may be more of these correctness-preserving solutions, which are not identified as such by is.submodel. See Baumgartner and Falk (2024) for details; see also the function causal_submodel in the list("frscore") package.
is.submodel requires two inputs: x and y. x is a character vector of cna solution formulas (asf or csf), and y is one asf or csf (i.e. a character string of length 1), viz. the target structure or ground truth. The function returns TRUE for elements of x that are submodels of y according to the definition provided in the previous paragraph. If strict = TRUE, x counts as a submodel of y only if x is a proper part of y (i.e. x is not identical to y).
The function identical.model returns TRUE only if x (which must be of length 1) and y are identical. It can be used to test whether y is completely recovered in an inverse search.
Returns
Logical vector of the same length as x.
References
Baumgartner, Michael and Alrik Thiem. 2020. Often Trusted But Never (Properly) Tested: Evaluating QualitativeComparative Analysis . Sociological Methods & Research 49:279-311.
Baumgartner, Michael and Christoph Falk. 2024. Quantifying the Quality of Configurational Causal Models , Journal of Causal Inference 60(1):20230032. doi: 10.1515/jci-2023-0032.