rmvDAG() R function from [pcalg]

Generate Multivariate Data according to a DAG

Generate multivariate data with dependency structure specified by a (given) DAG (D irected A cyclic G raph) with nodes corresponding to random variables. The DAG has to be topologically ordered .


rmvDAG(n, dag,
       errDist = c("normal", "cauchy", "t4", "mix", "mixt3", "mixN100"),
       mix = 0.1, errMat = NULL, back.compatible = FALSE,
       use.node.names = !back.compatible)

Arguments

n: number of samples that should be drawn. (integer)
dag: a graph object describing the DAG; must contain weights for all the edges. The nodes must be topologically sorted. (For topological sorting use tsort from the RBGL package.)
errDist: string specifying the distribution of each node. Currently, the options "normal", "t4", "cauchy", "mix", "mixt3" and "mixN100" are supported. The first three generate standard normal-, t(df=4)- and cauchy-random numbers. The options containing the word "mix" create standard normal random variables with a mix of outliers. The outliers for the options "mix", "mixt3", "mixN100" are drawn from a standard cauchy, t(df=3) and N(0,100) distribution, respectively. The fraction of outliers is determined by the mix argument.
mix: for the "mix*" error distributuion, mix

specifies the fraction of outlier samples (i.e., Cauchy, $t_3$

or $N(0,100)$ ).
errMat: numeric $n * p$ matrix specifiying the error vectors $e_i$ (see Details), instead of specifying errDist (and maybe mix).
back.compatible: logical indicating if the data generated should be the same as with pcalg version 1.0-6 and earlier (where wgtMatrix() differed).
use.node.names: logical indicating if the column names of the result matrix should equal nodes(dag), very sensibly, but new, hence the default.

Returns

A $n*p$ matrix with the generated data. The $p$ columns correspond to the nodes (i.e., random variables) and each of the $n$ rows correspond to a sample.

Details

Each node is visited in the topological order. For each node $i$ we generate a $p$ -dimensional value $X_i$ in the following way: Let $X_1,\ldots,X_k$ denote the values of all neighbours of $i$

with lower order. Let $w_1,\ldots,w_k$ be the weights of the corresponding edges. Furthermore, generate a random vector $E_i$ according to the specified error distribution. Then, the value of $X_i$ is computed as

X_i = w_1*X_1 + \ldots + w_k*X_k + E_i.

If node $i$ has no neighbors with lower order, $X_i = E_i$ is set.

Author(s)

Markus Kalisch (kalisch@stat.math.ethz.ch ) and Martin Maechler.

Examples


## generate random DAG
p <- 20
rDAG <- randomDAG(p, prob = 0.2, lB=0.1, uB=1)

if (require(Rgraphviz)) {
## plot the DAG
plot(rDAG, main = "randomDAG(20, prob = 0.2, ..)")
}

## generate 1000 samples of DAG using standard normal error distribution
n <- 1000
d.normMat <- rmvDAG(n, rDAG, errDist="normal")

## generate 1000 samples of DAG using standard t(df=4) error distribution
d.t4Mat <- rmvDAG(n, rDAG, errDist="t4")

## generate 1000 samples of DAG using standard normal with a cauchy
## mixture of 30 percent
d.mixMat <- rmvDAG(n, rDAG, errDist="mix",mix=0.3)

require(MASS) ## for mvrnorm()
Sigma <- toeplitz(ARMAacf(0.2, lag.max = p - 1))
dim(Sigma)# p x p
## *Correlated* normal error matrix "e_i" (against model assumption)
eMat <- mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
d.CnormMat <- rmvDAG(n, rDAG, errMat = eMat)

pcalg package Read PDF manual

Maintainer: Markus Kalisch
License: GPL (>= 2)
Last published: 2024-09-12
https://pcalg.r-forge.r-project.org/

rmvDAG function