sample_size: Number of full rankings to be sampled. Defaults to 1.
n_items: Number of items.
n_clust: Number of mixture components. Defaults to 1.
rho: Integer G$$x$$n matrix with the component-specific consensus rankings in each row. Defaults to NULL, meaning that the consensus rankings are randomly generated according to the sampling scheme indicated by the uniform argument. See Details.
theta: Numeric vector of G non-negative component-specific precision parameters. Defaults to NULL, meaning that the concentrations are uniformly generated from an interval containing typical values for the precisions. See Details.
weights: Numeric vector of G positive mixture weights (normalization is not necessary). Defaults to NULL, meaning that the mixture weights are randomly generated according to the sampling scheme indicated by the uniform argument. See Details.
uniform: Logical: whether rho or weights have to be sampled uniformly on their support. When uniform = FALSE they are sampled, respectively, to ensure separation among mixture components and populated weights. Used when G>1 and either rho or weights are NULL (see Details). Defaults to FALSE.
mh: Logical: whether the samples must be drawn with the Metropolis-Hastings (MH) scheme implemented in the BayesMallows package, rather by direct sampling from the Mallows probability distribution. For n_items > 10, the MH is always applied to speed up the sampling procedure. Defaults to TRUE.
Returns
A list of the following named components:
samples: Integer N$$x$$n matrix with the sample_size simulated full rankings in each row.
rho: Integer G$$x$$n matrix with the component-specific consensus rankings used for the simulation in each row.
theta: Numeric vector of the G component-specific precision parameters used for the simulation.
weights: Numeric vector of the G mixture weights used for the simulation.
classification: Integer vector of the sample_size component membership labels.
Details
When n_items > 10 or mh = TRUE, the random samples are obtained by using the Metropolis-Hastings algorithm, described in Vitelli et al. (2018) and implemented in the sample_mallows function of the package BayesMallows package.
When theta = NULL, the concentration parameters are randomly generated from a uniform distribution on the interval (1/n2,3/n1.5) containing typical values for the precisions.
When uniform = FALSE, the mixing weights are sampled from a symmetric Dirichlet distribution with shape parameters all equal to 2G, to favor populated and balanced clusters, and the consensus parameters are sampled to favor well-separated clusters, i. e., at least at Spearman distance equal to G2(3n+1) from each other.
Examples
## Example 1. Drawing from a mixture with randomly generated parameters of separated clusters.set.seed(12345)rMSmix(sample_size =50, n_items =25, n_clust =5)## Example 2. Drawing from a mixture with uniformly generated parameters.set.seed(12345)rMSmix(sample_size =100, n_items =9, n_clust =3, uniform =TRUE)## Example 3. Drawing from a mixture with customized parameters.r_par <- rbind(1:5, c(4,5,2,1,3))t_par <- c(0.01,0.02)w_par <- c(0.4,0.6)set.seed(12345)rMSmix(sample_size =50, n_items =5, n_clust =2, theta = t_par, rho = r_par, weights = w_par)
References
Vitelli V, Sørensen Ø, Crispino M, Frigessi A and Arjas E (2018). Probabilistic Preference Learning with the Mallows Rank Model. Journal of Machine Learning Research, 18 (158), pages 1--49, ISSN: 1532-4435, https://jmlr.org/papers/v18/15-481.html.
Sørensen Ø, Crispino M, Liu Q and Vitelli V (2020). BayesMallows: An R Package for the Bayesian Mallows Model. The R Journal, 12 (1), pages 324--342, DOI: 10.32614/RJ-2020-026.
Chenyang Zhong (2021). Mallows permutation model with L1 and L2 distances I: hit and run algorithms and mixing times. arXiv: 2112.13456.