BIC and AIC for mixtures of Mallows models with Spearman distance
BIC and AIC for mixtures of Mallows models with Spearman distance
bicMSmix and aicMSmix compute, respectively, the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) for a mixture of Mallows models with Spearman distance fitted on partial rankings.
rho: Integer G$$x$$n matrix with the component-specific consensus rankings in each row.
theta: Numeric vector of G non-negative component-specific precision parameters.
weights: Numeric vector of G positive mixture weights (normalization is not necessary).
rankings: Integer N$$x$$n matrix or data frame with partial rankings in each row. Missing positions must be coded as NA.
Returns
The BIC or AIC value.
Details
The (log-)likelihood evaluation is performed by augmenting the partial rankings with the set of all compatible full rankings (see data_augmentation), and then the marginal likelihood is computed.
When n≤20, the (log-)likelihood is exactly computed, otherwise it is approximated with the method introduced by Crispino et al. (2023). If n>170, the approximation is also restricted over a fixed grid of values for the Spearman distance to limit computational burden.
Examples
## Example 1. Simulate rankings from a 2-component mixture of Mallows models## with Spearman distance.set.seed(12345)rank_sim <- rMSmix(sample_size =50, n_items =12, n_clust =2)str(rank_sim)rankings <- rank_sim$samples
# Fit the true model.set.seed(12345)fit <- fitMSmix(rankings = rankings, n_clust =2, n_start =10)# Comparing the BIC at the true parameter values and at the MLE.bicMSmix(rho = rank_sim$rho, theta = rank_sim$theta, weights = rank_sim$weights, rankings = rank_sim$samples)bicMSmix(rho = fit$mod$rho, theta = fit$mod$theta, weights = fit$mod$weights, rankings = rank_sim$samples)aicMSmix(rho = rank_sim$rho, theta = rank_sim$theta, weights = rank_sim$weights, rankings = rank_sim$samples)aicMSmix(rho = fit$mod$rho, theta = fit$mod$theta, weights = fit$mod$weights, rankings = rank_sim$samples)## Example 2. Simulate rankings from a basic Mallows model with Spearman distance.set.seed(54321)rank_sim <- rMSmix(sample_size =50, n_items =8, n_clust =1)str(rank_sim)# Let us censor the observations to be top-5 rankings.rank_sim$samples[rank_sim$samples >5]<-NArankings <- rank_sim$samples
# Fit the true model with the two EM algorithms.set.seed(54321)fit_em <- fitMSmix(rankings = rankings, n_clust =1, n_start =10)set.seed(54321)fit_mcem <- fitMSmix(rankings = rankings, n_clust =1, n_start =10, mc_em =TRUE)# Compare the BIC at the true parameter values and at the MLEs.bicMSmix(rho = rank_sim$rho, theta = rank_sim$theta, weights = rank_sim$weights, rankings = rank_sim$samples)bicMSmix(rho = fit_em$mod$rho, theta = fit_em$mod$theta, weights = fit_em$mod$weights, rankings = rank_sim$samples)bicMSmix(rho = fit_mcem$mod$rho, theta = fit_mcem$mod$theta, weights = fit_mcem$mod$weights, rankings = rank_sim$samples)aicMSmix(rho = rank_sim$rho, theta = rank_sim$theta, weights = rank_sim$weights, rankings = rank_sim$samples)aicMSmix(rho = fit_em$mod$rho, theta = fit_em$mod$theta, weights = fit_em$mod$weights, rankings = rank_sim$samples)aicMSmix(rho = fit_mcem$mod$rho, theta = fit_mcem$mod$theta, weights = fit_mcem$mod$weights, rankings = rank_sim$samples)
References
Crispino M, Mollica C and Modugno L (2025+). MSmix: An R Package for clustering partial rankings via mixtures of Mallows Models with Spearman distance. (submitted)
Crispino M, Mollica C, Astuti V and Tardella L (2023). Efficient and accurate inference for mixtures of Mallows models with Spearman distance. Statistics and Computing, 33 (98), DOI: 10.1007/s11222-023-10266-8.
Schwarz G (1978). Estimating the dimension of a model. The Annals of Statistics, 6 (2), pages 461–464, DOI: 10.1002/sim.6224.
Sakamoto Y, Ishiguro M, and Kitagawa G (1986). Akaike Information Criterion Statistics. Dordrecht, The Netherlands: D. Reidel Publishing Company.