Evaluate msc, asf, and csf on the level of cases/configurations in the data
Evaluate msc, asf, and csf on the level of cases/configurations in the data
The condition function provides assistance to inspect the properties of msc, asf, and csf (as returned by cna) in a data frame or configTable, but also of any other Boolean expression. The function evaluates which configurations and cases instantiate a given msc, asf, or csf and lists the scores on selected evaluation measures (e.g. consistency and coverage).
As of version 4.0 of the cna package, the function condition has been renamed condList, such that the name of the function is now identical with the class of the resulting object. Since condition remains available as an alias of condList, backward compatibility of existing code is guaranteed.
condList(x, ct = full.ct(x),..., verbose =TRUE)condition(x, ct = full.ct(x),..., verbose =TRUE)## S3 method for class 'character'condList(x, ct = full.ct(x), measures = c("standard consistency","standard coverage"), type, add.data =FALSE, force.bool =FALSE, rm.parentheses =FALSE,..., verbose =TRUE)## S3 method for class 'condTbl'condList(x, ct = full.ct(x), measures = attr(x,"measures"),..., verbose =TRUE)## S3 method for class 'condList'print(x, n =3, printMeasures =TRUE,...)## S3 method for class 'cond'print(x, digits =3, print.table =TRUE, show.cases =NULL, add.data =NULL,...)
Arguments
x: Character vector specifying a Boolean expression such as "A + B*C -> D", where "A", "B", "C", "D" are factor values appearing in ct, or an object of class condTbl (cf. condTbl).
ct: Data frame or configTable.
measures: Character vector of length 2. measures[1] specifies the measure to be used for sufficiency evaluation, measures[2] the measure to be used for necessity evaluation. Any measure from showConCovMeasures() can be chosen. The default measures are standard consistency and coverage.
verbose: Logical; if TRUE and the argument ct is not provided in a call to condList() or condition(), a message is printed to the console stating that a complete configuration table created by full.ct() is used.
type: Character vector specifying the type of ct: "auto" (automatic detection; default), "cs" (crisp-set), "mv" (multi-value), or "fs" (fuzzy-set).
add.data: Logical; if TRUE, ct is attached to the output. Alternatively, ct can be requested by the add.data argument in print.cond.
force.bool: Logical; if TRUE, x is interpreted as a mere Boolean function, not as a causal model.
rm.parentheses: Logical; if TRUE, parentheses around x are removed prior to evaluation.
n: Positive integer determining the maximal number of evaluations to be printed.
printMeasures: Logical; if TRUE, the output indicates which measures for sufficiency and necessity evaluation were used.
digits: Number of digits to print in the scores on the chosen evaluation measures.
print.table: Logical; if TRUE, the table assigning configurations and cases to conditions is printed.
show.cases: Logical; if TRUE, the attribute cases of the configTable is printed; same default behavior as in print.configTable.
...: Arguments passed to methods.
Details
Depending on the processed data, the solutions output by cna are often ambiguous; that is, many solution formulas may fit the data equally well. If that happens, the data alone are insufficient to single out one solution. While cna simply lists all data-fitting solutions, the condition (aka condList) function provides assistance in comparing different minimally sufficient conditions (msc), atomic solution formulas (asf), and complex solution formulas (csf) in order to have a better basis for selecting among them.
Most importantly, the output of condition shows in which configurations and cases in the data an msc, asf, and csf is instantiated and not instantiated. Thus, if the user has prior causal knowledge about particular configurations or cases, the information received from condition may help identify the solutions that are consistent with that knowledge. Moreover, condition indicates which configurations and cases are covered by the different cna solutions and which are not, and the function returns the scores on selected evaluation measures for each solution.
The condition function is independent of cna. That is, any msc, asf, or csf---irrespective of whether they are output by cna---can be given as input to condition. Even Boolean expressions that do not have the syntax of CNA solution formulas can be passed to condition.
The first required input x is either an object of class condTbl as produced by condTbl and the functions in cna-solutions or a character vector consisting of Boolean formulas composed of factor values that appear in data ct. ct is the second required input; it can be a configTable or a data frame. If ct is a data frame and the type argument has its default value "auto", condition first determines the data type and then converts the data frame into a configTable. The data type can also be manually specified by giving the type argument one of the values "cs", "mv", or "fs".
The measures argument is the same as in cna. Its purpose is to select the measures for evaluating whether the evidence in the data ct warrants an inference to sufficiency and necessity. It expects a character vector of length 2. The first element, measures[1], specifies the measure to be used for sufficiency evaluation, and measures[2] specifies the measure to be used for necessity evaluation. The available evaluation measures can be printed to the console through showConCovMeasures. The default measures are standard consistency and coverage. For more, see the cna package vignette (vignette("cna")), section 3.2.
The operation of conjunction can be expressed by ‘*’ or ‘&’ , disjunction by ‘+’ or ‘|’ , negation can be expressed by ‘-’ or ‘!’ or, in case of crisp-set or fuzzy-set data, by changing upper case into lower case letters and vice versa, implication by ‘->’ , and equivalence by ‘<->’ . Examples are
The type boolean comprises Boolean expressions that do not have the syntactic form of CNA solution formulas, meaning the character strings in x do not have an ‘-\>’ or ‘\<-\>’ as main operator. Examples: "A*B + C" or "-(A*B + -(C+d))". The expression is evaluated and written into a data frame with one column. Frequency is attached to this data frame as an attribute.
The type atomic comprises expressions that have the syntactic form of atomic solution formulas (asf), meaning the corresponding character strings in the argument x have an ‘-\>’ or ‘\<-\>’ as main operator. Examples: "A*B + C -\> D" or "A*B + C \<-\> D". The expressions on both sides of ‘-\>’ and ‘\<-\>’ are evaluated and written into a data frame with two columns. Scores on the selected evaluation measures are attached to these data frames as attributes.
The type complex represents complex solution formulas (csf). Example:
"(A*B + a*b \<-\> C)*(C*d + c*D \<-\> E)". Each component must be a solution formula of type atomic. These components are evaluated separately and the results stored in a list. Scores on the selected evaluation measures are attached to this list.
The types of the character strings in the input x are automatically discerned and thus do not need to be specified by the user.
If force.bool = TRUE, expressions with ‘->’ or ‘<->’ are treated as type boolean, i.e. only their frequencies are calculated. Enclosing a character string representing a causal solution formula in parentheses has the same effect as specifying force.bool = TRUE. rm.parentheses = TRUE removes parentheses around the expression prior to evaluation and thus has the reverse effect of setting force.bool = TRUE.
If add.data = TRUE, ct is appended to the output such as to facilitate the analysis and evaluation of a model on the case level.
The digits argument of the print method determines how many digits of the scores on the evaluation measures are printed. If print.table = FALSE, the table assigning conditions to configurations and cases is omitted, i.e. only frequencies or evaluation scores are returned. row.names = TRUE also lists the row names in ct. If rows in a ct are instantiated by many cases, those cases are not printed by default. They can be recovered by show.cases = TRUE.
print method
print.condList essentially executes print.cond (the method printing a single condition) successively for the first n list elements. All arguments in print.condList are thereby passed to print.cond, i.e. digits, print.table, show.cases, add.data can also be specified when printing the complete list of conditions.
The option spaces controls how the conditions are rendered in certain contexts. The current setting is queried by typing getOption("spaces"). The option specifies characters that will be printed with a space before and after them. The default is c("<->","->","+"). A more compact output is obtained with option(spaces = NULL).
Returns
condition (aka condList) returns a nested list of objects, each of them corresponding to one element of the input vector x. The list has a class attribute condList , the list elements (i.e., the individual conditions) are of class cond and have a more specific class label booleanCond , atomicCond or complexCond , reflecting the type of condition. The components of class booleanCond or atomicCond are data frames, those of class complexCond are lists of amended data frames.
References
Emmenegger, Patrick. 2011. Job Security Regulations in Western Democracies: A Fuzzy Set Analysis. European Journal of Political Research 50(3):336-64.
Lam, Wai Fung, and Elinor Ostrom. 2010. Analyzing the Dynamic Complexity of Development Interventions: Lessonsfrom an Irrigation Experiment in Nepal.
Policy Sciences 43 (2):1-25.
Ragin, Charles. 2008. Redesigning Social Inquiry: Fuzzy Sets and Beyond. Chicago, IL: University of Chicago Press.
See Also
condList-methods describes methods and functions processing the output of condition; see, in particular, the related summary and as.data.frame methods.
# Crisp-set data from Lam and Ostrom (2010) on the impact of development interventions # ------------------------------------------------------------------------------------# Any Boolean functions involving values of the factors "A", "R", "F", "L", "C", "W" in # d.irrigate can be tested by condition().condition("A*r + L*C", d.irrigate)condition(c("A*r + !(L*C)","A*-(L | -F)","C -> A*R + C*l"), d.irrigate)condList(c("A*r & !(L + C)","A*-(L & -F)","C -> !(A|R & C|l)"), d.irrigate)condition(c("A*r + L*C -> W","(A*R + C*l <-> F)*(W*a -> F)"), d.irrigate)# The same with non-default evaluation measures.condition(c("A*r + L*C -> W","(A*R + C*l <-> F)*(W*a -> F)"), d.irrigate, measures = c("PAcon","PACcov"))# Group expressions with "<->" by outcome with group.by.outcome() from condList-methods.irrigate.con <- condition(c("A*r + L*C <-> W","A*L*R <-> W","A*R + C*l <-> F","W*a <-> F"), d.irrigate)group.by.outcome(irrigate.con)# Pass minimally sufficient conditions inferred by cna() to condition()# in an object of class "condTbl".irrigate.cna1 <- cna(d.irrigate, ordering ="A, R, L < F, C < W", con =.9)condition(msc(irrigate.cna1), d.irrigate)# Pass atomic solution formulas inferred by cna() to condition().irrigate.cna1 <- cna(d.irrigate, ordering ="A, R, L < F, C < W", con =.9)condition(asf(irrigate.cna1), d.irrigate)# Print more than 3 evaluations to the console.condition(msc(irrigate.cna1), d.irrigate)|> print(n =10)# An analogous analysis with different evaluation measures.irrigate.cna1 <- cna(d.irrigate, ordering ="A, R, L < F, C < W", con =.8, measures = c("AACcon","AAcov"))condition(asf(irrigate.cna1), d.irrigate)# Add data and use different evaluation measures.irrigate.cna2 <- cna(d.irrigate, con =.9)(irrigate.cna2b.asf <- condition(asf(irrigate.cna2)$condition, d.irrigate, measures = c("PAcon","PACcov"), add.data =TRUE))# Print more conditions.print(irrigate.cna2b.asf, n =6)# No spaces before and after "+".options(spaces = c("<->","->"))irrigate.cna2b.asf
# No spaces at all.options(spaces =NULL)irrigate.cna2b.asf
# Restore the default spacing.options(spaces = c("<->","->","+"))# Print only the evaluation scores.print(irrigate.cna2b.asf, print.table =FALSE)summary(irrigate.cna2b.asf)# Print only 2 digits of the evaluation scores.print(irrigate.cna2b.asf, digits =2)# Instead of a configuration table, it is also possible to provide a data frame# as second input. condition("A*r + L*C", d.irrigate)condition(c("A*r + L*C","A*L -> F","C -> A*R + C*l"), d.irrigate)condition(c("A*r + L*C -> W","A*L*R -> W","A*R + C*l -> F","W*a -> F"), d.irrigate)# Fuzzy-set data from Emmenegger (2011) on the causes of high job security regulations# ------------------------------------------------------------------------------------# Compare the CNA solution for outcome JSR to the solution presented by Emmenegger# S*R*v + S*L*R*P + S*C*R*P + C*L*P*v -> JSR (p. 349), which was generated by fsQCA as# implemented in the fs/QCA software, version 2.5.jobsecurity.cna <- cna(d.jobsecurity, outcome ="JSR", con =.97, cov=.77, maxstep = c(4,4,15))solEmmenegger <-"S*R*v + S*L*R*P + S*C*R*P + C*L*P*v -> JSR"compare.sol <- condition(c(asf(jobsecurity.cna)$condition, solEmmenegger), d.jobsecurity)summary(compare.sol)print(compare.sol, add.data = d.jobsecurity)group.by.outcome(compare.sol)# There exist even more high quality solutions for JSR.jobsecurity.cna2 <- cna(d.jobsecurity, outcome ="JSR", con =.95, cov=.8, maxstep = c(4,4,15))compare.sol2 <- condList(c(asf(jobsecurity.cna2)$condition, solEmmenegger), d.jobsecurity)summary(compare.sol2)group.by.outcome(compare.sol2)# Simulate multi-value data# -------------------------library(dplyr)# Define the data generating structure.groundTruth <-"(A=2*B=1 + A=3*B=3 <-> C=1)*(C=1*D=2 + C=2*D=3 <-> E=3)"# Generate ideal data on groundTruth.fullData <- allCombs(c(3,3,2,3,3))idealData <- ct2df(selectCases(groundTruth, fullData))# Randomly add 15% inconsistent cases.inconsistentCases <- setdiff(fullData, idealData)realData <- rbind(idealData, inconsistentCases[sample(1:nrow(inconsistentCases), nrow(idealData)*0.15),])# Determine model fit of groundTruth and its submodels. condition(groundTruth, realData)condition("A=2*B=1 + A=3*B=3 <-> C=1", realData)condition("A=2*B=1 + A=3*B=3 <-> C=1", realData, measures = c("ccon","ccov"))condition("A=2*B=1 + A=3*B=3 <-> C=1", realData, measures = c("AACcon","AAcov"))condition("A=2*B=1 + A=3*B=3 <-> C=1", realData, force.bool =TRUE)condition("(C=1*D=2 + C=2*D=3 <-> E=3)", realData)condList("(C=1*D=2 + C=2*D=3 <-> E=3)", realData, rm.parentheses =TRUE)condition("(C=1*D=2 +!(C=2*D=3 + A=1*B=1) <-> E=3)", realData)# Manually calculate unique standard coverages, i.e. the ratio of an outcome's instances# covered by individual msc alone (for details on unique coverage cf.# Ragin 2008:63-68).summary(condition("A=2*B=1 * -(A=3*B=3) <-> C=1", realData))# unique coverage of A=2*B=1summary(condition("-(A=2*B=1) * A=3*B=3 <-> C=1", realData))# unique coverage of A=3*B=3# Note that expressions must feature factor VALUES contained in the data, they may not # contain factor NAMES. The following calls produce errors.condition("C*D <-> E", realData)condition("A=2*B=1 + C=23", realData)# In case of mv expressions, negations of factor values must be written with brackets.condition("!(A=2)", realData)# The following produces an error.condition("!A=2", realData)