It is possible to sample from categorical distribution parametrized by vector of unnormalized log-probabilities α[1],...,α[m]
without leaving the log space by employing the Gumbel-max trick (Maddison, Tarlow and Minka, 2014). If g[1],...,g[m] are samples from Gumbel distribution with cumulative distribution function F(g)=exp(−exp(−g)), then k=argmax(g[i]+α[i])
is a draw from categorical distribution parametrized by vector of probabilities p[1]....,p[m], such that p[i]=exp(α[i])/sum(exp(α)). This is implemented in rcatlp function parametrized by vector of log-probabilities log_prob.
Examples
# Generating 10 random draws from categorical distribution# with k=3 categories occuring with equal probabilities# parametrized using a vectorrcat(10, c(1/3,1/3,1/3))# or with k=5 categories parametrized using a matrix of probabilities# (generated from Dirichlet distribution)p <- rdirichlet(10, c(1,1,1,1,1))rcat(10, p)x <- rcat(1e5, c(0.2,0.4,0.3,0.1))plot(prop.table(table(x)), type ="h")lines(0:5, dcat(0:5, c(0.2,0.4,0.3,0.1)), col ="red")p <- rdirichlet(1, rep(1,20))x <- rcat(1e5, matrix(rep(p,2), nrow =2, byrow =TRUE))xx <-0:21plot(prop.table(table(x)))lines(xx, dcat(xx, p), col ="red")xx <- seq(0,21, by =0.01)plot(ecdf(x))lines(xx, pcat(xx, p), col ="red", lwd =2)pp <- seq(0,1, by =0.001)plot(ecdf(x))lines(qcat(pp, p), pp, col ="red", lwd =2)
References
Maddison, C. J., Tarlow, D., & Minka, T. (2014). A* sampling. [In:] Advances in Neural Information Processing Systems (pp. 3086-3094). https://arxiv.org/abs/1411.0030