Finite Mixtures of Mallows Models with Spearman Distance for Full and Partial Rankings
The MSmix
package provides functions to fit and analyze finite Mixtures of Mallows models with Spearman distance (a.k.a. -model) for full and partial rankings with arbitrary missing positions. Inference is conducted within the maximum likelihood (ML) framework via EM algorithms. Estimation uncertainty is tackled via diverse versions of bootstrapped and asymptotic confidence intervals.
package
The Mallows model is one of the most popular and frequently applied parametric distributions to analyze rankings of a finite set of items. However, inference for this model is challenging due to the intractability of the normalizing constant, also referred to as partition function. The present package performs ML estimation (MLE) of the Mallows model with Spearman distance from full and partial rankings with arbitrary censoring patterns. Thanks to the novel approximation of the model normalizing constant introduced by Crispino, Mollica, Astuti and Tardella (2023), as well as the existence of a closed-form expression of the MLE of the consensus ranking, MSmix
can address inference even for a large number of items. The package also allows to account for unobserved sample heterogeneity through MLE of finite mixtures of Mallows models with Spearman distance via EM algorithms, in order to perform a model-based clustering of partial rankings into groups with similar preferences.
Computational efficiency is achieved with the use of a hybrid language, combining R
and C++
code, and the possibility of parallel computation.
In addition to inferential techniques, the package provides various functions for data manipulation, simulation, descriptive summary and model selection.
Specific S3 classes and methods are also supplied to enhance the usability and foster exchange with other packages.
The suite of functions available in the MSmix
package is composed of:
Ranking data manipulation
data_conversion
: From rankings to orderings and vice versa.data_censoring
: Censoring of full rankings.data_completion
: Deterministic completion of partial rankings with full reference rankings.data_augmentation
: Generate all full rankings compatible with partial rankings.Ranking data simulation
rMSmix
: Random samples from finite mixtures of Mallows models with Spearman distance.Ranking data description
data_description
: Descriptive summaries for partial rankings.Model estimation
fitMSmix
: MLE of mixtures of Mallows models with Spearman distance via EM algorithms.likMSmix
: Likelihood evaluation for mixtures of Mallows models with Spearman distance.Model selection
bicMSmix
: BIC value for the fitted mixture of Mallows models with Spearman distance.aicMSmix
: AIC value for the fitted mixture of Mallows models with Spearman distance.Estimation uncertainty
bootstrapMSmix
: Bootstrap confidence intervals for mixtures of Mallows models with Spearman distance.confintMSmix
: Asymptotic confidence intervals for mixtures of Mallows models with Spearman distance.Spearman distance utilities
spear_dist
: Spearman distance computation for full rankings.spear_dist_distr
: Spearman distance distribution under the uniform (null) model.partition_fun_spear
: Partition function of the Mallows model with Spearman distance.expected_spear_dist
: Expected Spearman distance under the Mallows model with Spearman distance.var_spear_dist
: Variance of the Spearman distance under the Mallows model with Spearman distance.S3 class methods
print.bootMSmix
: Print the bootstrap confidence intervals of mixtures of Mallows models with Spearman distance.print.data_descr
: Print the descriptive statistics for partial rankings.print.emMSmix
: Print the MLEs of mixtures of Mallows models with Spearman distance.print.summary.emMSmix
: Print the summary of the MLEs of mixtures of Mallows models with Spearman distance.plot.bootMSmix
: Plot the bootstrap confidence intervals of mixtures of Mallows models with Spearman distance.plot.data_descr
: Plot the descriptive statistics for partial rankings.plot.dist
: Plot the Spearman distance matrix for full rankings.plot.emMSmix
: Plot the MLEs of mixtures of Mallows models with Spearman distance.summary.emMSmix
: Summary of the MLEs of mixtures of Mallows models with Spearman distance.Datasets
ranks_antifragility
: Antifragility features of innovative startups (full rankings with covariates).ranks_horror
: Arkham Horror data (full rankings).ranks_beers
: Beers data (partial rankings with different censoring patterns and a covariate).ranks_read_genres
: Reading preference data (partial top-5 rankings with covariates).ranks_sports
: Sport preferences and habits (full rankings with covariates).Some quantities frequently recalled in the manual are the following:
Data must be supplied as an integer N$$x$$n matrix with partial rankings in each row and missing positions denoted as NA (rank = 1 indicates the most-liked item). Partial sequences with a single missing entry are automatically filled in, as they correspond to full rankings. In the present setting, ties are not allowed.
Crispino M, Mollica C, Astuti V and Tardella L (2023). Efficient and accurate inference for mixtures of Mallows models with Spearman distance. Statistics and Computing, 33 (98), DOI: 10.1007/s11222-023-10266-8.
Crispino M, Mollica C, Modugno L, Casadio Tarabusi E, and Tardella L (2024+). MSmix: An R Package for clustering partial rankings via mixtures of Mallows models with Spearman distance. (submitted).
Cristina Mollica, Marta Crispino, Lucia Modugno and Luca Tardella
Maintainer: Cristina Mollica <cristina.mollica@uniroma1.it>
Useful links