rmvDAG function

Generate Multivariate Data according to a DAG

Generate Multivariate Data according to a DAG

Generate multivariate data with dependency structure specified by a (given) DAG (D irected A cyclic G raph) with nodes corresponding to random variables. The DAG has to be topologically ordered .

rmvDAG(n, dag, errDist = c("normal", "cauchy", "t4", "mix", "mixt3", "mixN100"), mix = 0.1, errMat = NULL, back.compatible = FALSE, use.node.names = !back.compatible)

Arguments

  • n: number of samples that should be drawn. (integer)

  • dag: a graph object describing the DAG; must contain weights for all the edges. The nodes must be topologically sorted. (For topological sorting use tsort from the RBGL package.)

  • errDist: string specifying the distribution of each node. Currently, the options "normal", "t4", "cauchy", "mix", "mixt3" and "mixN100" are supported. The first three generate standard normal-, t(df=4)- and cauchy-random numbers. The options containing the word "mix" create standard normal random variables with a mix of outliers. The outliers for the options "mix", "mixt3", "mixN100" are drawn from a standard cauchy, t(df=3) and N(0,100) distribution, respectively. The fraction of outliers is determined by the mix argument.

  • mix: for the "mix*" error distributuion, mix

    specifies the fraction of outlier samples (i.e., Cauchy, t3t_3

    or N(0,100)N(0,100)).

  • errMat: numeric npn * p matrix specifiying the error vectors eie_i (see Details), instead of specifying errDist (and maybe mix).

  • back.compatible: logical indicating if the data generated should be the same as with pcalg version 1.0-6 and earlier (where wgtMatrix() differed).

  • use.node.names: logical indicating if the column names of the result matrix should equal nodes(dag), very sensibly, but new, hence the default.

Returns

A npn*p matrix with the generated data. The pp columns correspond to the nodes (i.e., random variables) and each of the nn rows correspond to a sample.

Details

Each node is visited in the topological order. For each node ii we generate a pp-dimensional value XiX_i in the following way: Let X1,,XkX_1,\ldots,X_k denote the values of all neighbours of ii

with lower order. Let w1,,wkw_1,\ldots,w_k be the weights of the corresponding edges. Furthermore, generate a random vector EiE_i according to the specified error distribution. Then, the value of XiX_i is computed as

Xi=w1X1++wkXk+Ei. X_i = w_1*X_1 + \ldots + w_k*X_k + E_i.

If node ii has no neighbors with lower order, Xi=EiX_i = E_i is set.

See Also

randomDAG for generating a random DAG; skeleton and pc for estimating the skeleton and the CPDAG of a DAG that corresponds to the data.

Author(s)

Markus Kalisch (kalisch@stat.math.ethz.ch ) and Martin Maechler.

Examples

## generate random DAG p <- 20 rDAG <- randomDAG(p, prob = 0.2, lB=0.1, uB=1) if (require(Rgraphviz)) { ## plot the DAG plot(rDAG, main = "randomDAG(20, prob = 0.2, ..)") } ## generate 1000 samples of DAG using standard normal error distribution n <- 1000 d.normMat <- rmvDAG(n, rDAG, errDist="normal") ## generate 1000 samples of DAG using standard t(df=4) error distribution d.t4Mat <- rmvDAG(n, rDAG, errDist="t4") ## generate 1000 samples of DAG using standard normal with a cauchy ## mixture of 30 percent d.mixMat <- rmvDAG(n, rDAG, errDist="mix",mix=0.3) require(MASS) ## for mvrnorm() Sigma <- toeplitz(ARMAacf(0.2, lag.max = p - 1)) dim(Sigma)# p x p ## *Correlated* normal error matrix "e_i" (against model assumption) eMat <- mvrnorm(n, mu = rep(0, p), Sigma = Sigma) d.CnormMat <- rmvDAG(n, rDAG, errMat = eMat)