Calculate the conditional probability of belonging to each cluster in a Poisson mixture model
Calculate the conditional probability of belonging to each cluster in a Poisson mixture model
This function computes the conditional probabilities tik that an observation i arises from the kth component for the current value of the mixture parameters.
probaPost(y, g, conds, pi, s, lambda)
Arguments
y: (n x q) matrix of observed counts for n observations and q variables
g: Number of clusters
conds: Vector of length q defining the condition (treatment group) for each variable (column) in y
pi: Vector of length g containing the current estimate of π
s: Vector of length q containing the estimates for the normalized library size parameters for each of the q variables in y
lambda: (d x g) matrix containing the current estimate λ, where d is the number of conditions (treatment groups)
Returns
t: (n x g) matrix made up of the conditional probability of each observation belonging to each of the g clusters
References
Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., Celeux G. (2015). Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics, 31(9):1420-1427.
Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011). Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at https://inria.hal.science/inria-00638082.
Author(s)
Andrea Rau
Note
If all values of tik are 0 (or nearly zero), the observation is assigned with probability one to belong to the cluster with the closest mean (in terms of the Euclidean distance from the observation). To avoid calculation difficulties, extreme values of tik are smoothed, such that those smaller than 1e-10 or larger than 1-1e-10 are set equal to 1e-10 and 1-1e-10, respectively.
See Also
PoisMixClus for Poisson mixture model estimation and model selection; PoisMixMean to calculate the conditional per-cluster mean of each observation
Examples
set.seed(12345)## Simulate data as shown in Rau et al. (2011)## Library size setting "A", high cluster separation## n = 200 observationssimulate <- PoisMixSim(n =200, libsize ="A", separation ="high")y <- simulate$y
conds <- simulate$conditions
s <- colSums(y)/ sum(y)## TC estimate of lib size## Run the PMM-II model for g = 3## "TC" library size estimate, EM algorithmrun <- PoisMixClus(y, g =3, norm ="TC", conds = conds)pi.est <- run$pi
lambda.est <- run$lambda
## Calculate the conditional probability of belonging to each clusterproba <- probaPost(y, g =3, conds = conds, pi = pi.est, s = s, lambda = lambda.est)## head(round(proba,2))