Generate multivariate data with dependency structure specified by a (given) DAG (D irected A cyclic G raph) with nodes corresponding to random variables. The DAG has to be topologically ordered .
n: number of samples that should be drawn. (integer)
dag: a graph object describing the DAG; must contain weights for all the edges. The nodes must be topologically sorted. (For topological sorting use tsort from the RBGL package.)
errDist: string specifying the distribution of each node. Currently, the options "normal", "t4", "cauchy", "mix", "mixt3" and "mixN100" are supported. The first three generate standard normal-, t(df=4)- and cauchy-random numbers. The options containing the word "mix" create standard normal random variables with a mix of outliers. The outliers for the options "mix", "mixt3", "mixN100" are drawn from a standard cauchy, t(df=3) and N(0,100) distribution, respectively. The fraction of outliers is determined by the mix argument.
mix: for the "mix*" error distributuion, mix
specifies the fraction of outlier samples (i.e., Cauchy, t3
or N(0,100)).
errMat: numeric n∗p matrix specifiying the error vectors ei (see Details), instead of specifying errDist (and maybe mix).
back.compatible: logical indicating if the data generated should be the same as with pcalg version 1.0-6 and earlier (where wgtMatrix() differed).
use.node.names: logical indicating if the column names of the result matrix should equal nodes(dag), very sensibly, but new, hence the default.
Returns
A n∗p matrix with the generated data. The p columns correspond to the nodes (i.e., random variables) and each of the n rows correspond to a sample.
Details
Each node is visited in the topological order. For each node i we generate a p-dimensional value Xi in the following way: Let X1,…,Xk denote the values of all neighbours of i
with lower order. Let w1,…,wk be the weights of the corresponding edges. Furthermore, generate a random vector Ei according to the specified error distribution. Then, the value of Xi is computed as
Xi=w1∗X1+…+wk∗Xk+Ei.
If node i has no neighbors with lower order, Xi=Ei is set.
See Also
randomDAG for generating a random DAG; skeleton and pc for estimating the skeleton and the CPDAG of a DAG that corresponds to the data.
## generate random DAGp <-20rDAG <- randomDAG(p, prob =0.2, lB=0.1, uB=1)if(require(Rgraphviz)){## plot the DAGplot(rDAG, main ="randomDAG(20, prob = 0.2, ..)")}## generate 1000 samples of DAG using standard normal error distributionn <-1000d.normMat <- rmvDAG(n, rDAG, errDist="normal")## generate 1000 samples of DAG using standard t(df=4) error distributiond.t4Mat <- rmvDAG(n, rDAG, errDist="t4")## generate 1000 samples of DAG using standard normal with a cauchy## mixture of 30 percentd.mixMat <- rmvDAG(n, rDAG, errDist="mix",mix=0.3)require(MASS)## for mvrnorm()Sigma <- toeplitz(ARMAacf(0.2, lag.max = p -1))dim(Sigma)# p x p## *Correlated* normal error matrix "e_i" (against model assumption)eMat <- mvrnorm(n, mu = rep(0, p), Sigma = Sigma)d.CnormMat <- rmvDAG(n, rDAG, errMat = eMat)