This function simulates data from a Poisson mixture model, as described by Rau et al. (2011). Data are simulated with varying expression level (wi) for 4 clusters. Clusters may be simulated with high or low separation, and three different options are available for the library size setting: equal , A , and B , as described by Rau et al. (2011).
PoisMixSim(n =2000, libsize, separation)
Arguments
n: Number of observations
libsize: The type of library size difference to be simulated (‘equal’ , ‘A’ , or ‘B’ , as described by Rau et al. (2011))
separation: Cluster separation (‘high’ or ‘low’ , as described by Rau et al. (2011))
Returns
y: (n x q) matrix of simulated counts for n observations and q variables
labels: Vector of length n defining the true cluster labels of the simulated data
pi: Vector of length 4 (the number of clusters) containing the true value of π
lambda: (d x 4) matrix of λ values for d conditions (3 in the case of libsize = ‘equal’ or ‘A’ , and 2 otherwise) in 4 clusters (see note below)
w: Row sums of y (estimate of w^)
conditions: Vector of length q defining the condition (treatment group) for each variable (column) in y
References
Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011). Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at https://inria.hal.science/inria-00638082.
Author(s)
Andrea Rau
Note
If one or more observations are simulated such that all variables have a value of 0, those rows are removed from the data matrix; as such, in some cases the simulated data y may have less than n rows.
The PMM-I model includes the parameter constraint ∑kλjkrj=1, where rj is the number of replicates in condition (treatment group) j. Similarly, the parameter constraint in the PMM-II model is ∑j∑lλjksjl=1, where sjl is the library size for replicate l of condition j. The value of lambda corresponds to that used to generate the simulated data, where the library sizes were set as described in Table 2 of Rau et al. (2011). However, due to variability in the simulation process, the actually library sizes of the data y are not exactly equal to these values; this means that the value of lambda may not be directly compared to an estimated value of λ^ as obtained from the PoisMixClus function.
Examples
set.seed(12345)## Simulate data as shown in Rau et al. (2011)## Library size setting "A", high cluster separation## n = 200 observationssimulate <- PoisMixSim(n =200, libsize ="A", separation ="high")y <- simulate$y
conds <- simulate$conditions