where xi=[xi1,…,xiK] is the random vector formed by K taxa (features) counts (RAD vector), Ni=∑j=1Kxij is the total number of reads (sequence depth), {πj} are the mean of taxa-proportions (RAD-probability mean), and θ is the overdispersion parameter.
Note: Though the test statistic supports an unequal number of reads across samples, the performance has not yet been fully tested.
References
Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49, 65-82.
Tvedebrink, T. (2010). Overdispersion in allelic counts and theta-correction in forensic genetics. Theor Popul Biol 78, 200-210.
Examples
data(saliva)### Generate a the number of reads per sample### The first number is the number of reads and the second is the number of subjects nrs <- rep(15000,20)### Get gamma from the dirichlet-multinomial parameters shape <- dirmult(saliva)$gamma
dmData <- Dirichlet.multinomial(nrs, shape) dmData[1:5,1:5]