udag2apag function

Last step of RFCI algorithm: Transform partially oriented graph into RFCI-PAG

Last step of RFCI algorithm: Transform partially oriented graph into RFCI-PAG

This function performs the last step of the RFCI algorithm: It transforms a partially oriented graph in which the v-structures have been oriented into an RFCI Partial Ancestral Graph (PAG) (see Colombo et al (2012)).

While orienting the edges, this function performs some additional conditional independence tests in orientation rule 4 to ensure correctness of the ancestral relationships. As a result of these additional tests, some additional edges can be deleted. The result is the final adjacency matrix indicating also the edge marks and the updated sepsets.

udag2apag(apag, suffStat, indepTest, alpha, sepset, rules = rep(TRUE, 10), unfVect = NULL, verbose = FALSE)

Arguments

  • apag: Adjacency matrix of type amat.pag

  • suffStat: Sufficient statistics: A list containing all necessary elements for the conditional independence decisions in the function indepTest.

  • indepTest: Pre-defined function for testing conditional independence. The function is internally called as indepTest(x,y,S,suffStat), and tests conditional independence of x and y given S. Here, x and y

    are variables, and S is a (possibly empty) set of variables (all variables are coded by their column numbers in the adjacency matrix). suffStat is a list containing all relevant elements for the conditional independence decisions. The return value of indepTest is the p-value of the test for conditional independence.

  • alpha: Significance level for the individual conditional independence tests.

  • sepset: List of length p; each element of the list contains another list of length p. The element sepset[[x]][[y]] contains the separation set that made the edge between x and y drop out. Each separation set is a vector with (integer) positions of variables in the adjacency matrix. This object is thought to be obtained from a pcAlgo-object.

  • rules: Logical vector of length 10 with TRUE or FALSE for each rule, where TRUE in position i means that rule i (Ri) will be applied. By default, all rules are active.

  • unfVect: Vector containing numbers that encode the ambiguous triples (as returned by pc.cons.intern). This is needed in the conservative and in the majority rule versions of RFCI.

  • verbose: Logical indicating if detailed output is to be given.

Details

The partially oriented graph in which the v-structures have been oriented is transformed into an RFCI-PAG using adapted rules of Zhang (2008). This function is similar to udag2pag used to orient the skeleton into a PAG in the FCI algorithm. However, it is slightly more complicated because we perform additional conditional independence tests when applying rule 4, to ensure correctness of the ancestral relationships. As a result, some additional edges can be deleted, see Colombo et al. (2012). Because of these addiitonal tests, we need to give suffStat, indepTest, and alpha as inputs. Since edges can be deleted, the input adjacency matrix apag and the input separating sets sepset can change in this algorithm.

If unfVect = NULL (no ambiguous triples), the orientation rules are applied to each eligible structure until no more edges can be oriented. On the other hand, hand, if one uses conservative or majority rule FCI and ambiguous triples have been found in pc.cons.intern, unfVect contains the numbers of all ambiguous triples in the graph. In this case, the orientation rules take this information into account. For example, if a -> b o- c and <a,b,c> is an unambigous unshielded triple and not a v-structure, then we obtain b -* c (otherwise we would create an additional v-structure). On the other hand, if a -> b o- c but <a,b,c> is an ambiguous unshielded triple, then the circle mark at b is not oriented.

Note that the algorithm works with columns' position of the adjacency matrix and not with the names of the variables.

Note that this function does not resolve possible order-dependence in the application of the orientation rules, see Colombo and Maathuis (2014).

Returns

  • apag: Final adjacency matrix of type amat.pag

  • sepset: Updated list of separating sets

References

D. Colombo and M.H. Maathuis (2014).Order-independent constraint-based causal structure learning. Journal of Machine Learning Research

15 3741-3782.

D. Colombo, M. H. Maathuis, M. Kalisch, T. S. Richardson (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40 , 294--321.

J. Zhang (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence 172 , 1873--1896.

Author(s)

Diego Colombo and Markus Kalisch (kalisch@stat.math.ethz.ch )

See Also

rfci, udag2pag, dag2pag, udag2pdag, udag2pdagSpecial, udag2pdagRelaxed

Examples

##################################################% -------- ----------- ## Example with hidden variables ## Zhang (2008), Fig. 6, p.1882 ################################################## ## create the DAG : amat <- t(matrix(c(0,1,0,0,1, 0,0,1,0,0, 0,0,0,1,0, 0,0,0,0,0, 0,0,0,1,0),5,5)) V <- LETTERS[1:5] colnames(amat) <- rownames(amat) <- V edL <- setNames(vector("list",length=5), V) edL[[1]] <- list(edges= c(2,4),weights=c(1,1)) edL[[2]] <- list(edges= 3, weights=c(1)) edL[[3]] <- list(edges= 5, weights=c(1)) edL[[4]] <- list(edges= 5, weights=c(1)) ## and leave edL[[ 5 ]] empty g <- new("graphNEL", nodes=V, edgeL=edL, edgemode="directed") if (require(Rgraphviz)) plot(g) ## define the latent variable L <- 1 ## compute the true covariance matrix of g cov.mat <- trueCov(g) ## delete rows and columns belonging to latent variable L true.cov <- cov.mat[-L,-L] ## transform covariance matrix into a correlation matrix true.corr <- cov2cor(true.cov) n <- 100000 alpha <- 0.01 p <- ncol(true.corr) if (require("MASS")) { ## generate 100000 samples of DAG using standard normal error distribution set.seed(289) d.mat <- mvrnorm(n, mu = rep(0, p), Sigma = true.cov) ## estimate the skeleton of given data suffStat <- list(C = cor(d.mat), n = n) indepTest <- gaussCItest resD <- skeleton(suffStat, indepTest, alpha = alpha, labels=colnames(true.corr)) ## estimate all ordered unshielded triples amat.resD <- as(resD@graph, "matrix") print(u.t <- find.unsh.triple(amat.resD)) # four of them ## check and orient v-structures vstrucs <- rfci.vStruc(suffStat, indepTest, alpha=alpha, sepset = resD@sepset, g.amat = amat.resD, unshTripl= u.t$unshTripl, unshVect = u.t$unshVect, verbose = TRUE) ## Estimate the final skeleton and extend it into a PAG ## (using all 10 rules, as per default): resP <- udag2apag(vstrucs$amat, suffStat, indepTest=indepTest, alpha=alpha, sepset=vstrucs$sepset, verbose = TRUE) print(Amat <- resP$graph) } # only if "MASS" is there