selectPLMIX function

Bayesian selection criteria for mixtures of Plackett-Luce models

Bayesian selection criteria for mixtures of Plackett-Luce models

Compute Bayesian comparison criteria for mixtures of Plackett-Luce models with a different number of components.

selectPLMIX(pi_inv, seq_G, MCMCsampleP = vector(mode = "list", length = length(seq_G)), MCMCsampleW = vector(mode = "list", length = length(seq_G)), MAPestP, MAPestW, deviance, post_est = "mean", parallel = FALSE)

Arguments

  • pi_inv: An object of class top_ordering, collecting the numeric N$$x$$K data matrix of partial orderings, or an object that can be coerced with as.top_ordering.
  • seq_G: Numeric vector with the number of components of the Plackett-Luce mixtures to be compared.
  • MCMCsampleP: List of size length(seq_G), whose generic element is a numeric L$$x$$(G*K) matrix with the MCMC samples of the component-specific support parameters. Default is list of NULL elements.
  • MCMCsampleW: List of size length(seq_G), whose generic element is a numeric L$$x$$G matrix with the MCMC samples of the mixture weights. Default is list of NULL elements.
  • MAPestP: List of size length(seq_G), whose generic element is a numeric G$$x$$K matrix with the MAP estimates of the component-specific support parameters.
  • MAPestW: List of size length(seq_G), whose generic element is a numeric vector with the MAP estimates of the GG mixture weights.
  • deviance: List of size length(seq_G), whose generic element is a numeric vector of posterior deviance values.
  • post_est: Character string indicating the point estimates of the Plackett-Luce mixture parameters to be computed from the MCMC sample. This argument is ignored when MAP estimates are supplied in the MAPestP and MAPestW arguments. Default is "mean". Alternatively, one can choose "median" (see 'Details').
  • parallel: Logical: whether parallelization should be used. Default is FALSE.

Returns

A list of named objects:

  • point_estP: List of size length(seq_G), whose generic element is a numeric G$$x$$K matrix with the point estimates of the component-specific support parameters employed for the computation of the criteria.

  • point_estW: List of size length(seq_G), whose generic element is a numeric vector with the GG point estimates of the mixture weights employed for the computation of the criteria.

  • fitting: Numeric length(seq_G)x$$2 matrix with the fitting terms of the comparison measures, given by the posterior expected deviance D_bar and the deviance D_hat evaluated at the point estimate.

  • penalties: Numeric length(seq_G)x$$2 matrix with the penalty terms pD and pV (effective number of parameters).

  • criteria: Numeric length(seq_G)x$$6 matrix of Bayesian model selection criteria: DIC1, DIC2, BPIC1, BPIC2, BICM1 and BICM2 (see 'Details').

Details

The selectPLMIX function privileges the use of the MAP point estimates to compute the Bayesian model comparison criteria, since they are not affected by the label switching issue. By setting both the MAPestP and MAPestW arguments equal to NULL, the user can alternatively compute the selection measures by relying on a different posterior summary ("mean" or "median") specified in the post_est argument. In the latter case, the MCMC samples for each Plackett-Luce mixture must be supplied in the lists MCMCsampleP and MCMCsampleW. The drawback when working with point estimates other than the MAP is that the possible presence of label switching has to be previously removed from the traces to obtain meaningful results. See the label_switchPLMIX function to perfom label switching adjustment of the MCMC samples.

Several model selection criteria are returned. The two versions of DIC correspond to alternative ways of computing the effective number of parameters: DIC1 was proposed by Spiegelhalter et al. (2002) with penalty named pD, whereas DIC2 was proposed by Gelman et al. (2004) with penalty named pV. The latter coincides with the AICM introduced by Raftery et al. (2007), that is, the Bayesian counterpart of AIC. BPIC1 and BPIC2 are obtained from the two DIC by simply doubling the penalty term, as suggested by Ando (2007) to contrast DIC's tendency to overfitting. BICM1 is the Bayesian variant of the BIC, originally presented by Raftery et al. (2007) and entirely based on the MCMC sample. The BICM2, instead, involved the MAP estimate without the need of its approximation from the MCMC sample as for the BICM1.

Examples

data(d_carconf) K <- ncol(d_carconf) ## Fit 1- and 2-component PL mixtures via MAP estimation MAP_1 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=1, n_start=2, n_iter=400*1) MAP_2 <- mapPLMIX_multistart(pi_inv=d_carconf, K=K, G=2, n_start=2, n_iter=400*2) mcmc_iter <- 30 burnin <- 10 ## Fit 1- and 2-component PL mixtures via Gibbs sampling procedure GIBBS_1 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=1, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_1$mod$P_map, z=binary_group_ind(MAP_1$mod$class_map,G=1))) GIBBS_2 <- gibbsPLMIX(pi_inv=d_carconf, K=K, G=2, n_iter=mcmc_iter, n_burn=burnin, init=list(p=MAP_2$mod$P_map, z=binary_group_ind(MAP_2$mod$class_map,G=2))) ## Select the optimal number of components SELECT <- selectPLMIX(pi_inv=d_carconf, seq_G=1:2, MAPestP=list(MAP_1$mod$P_map, MAP_2$mod$P_map), MAPestW=list(MAP_1$mod$W_map, MAP_2$mod$W_map), deviance=list(GIBBS_1$deviance, GIBBS_2$deviance)) SELECT$criteria

References

Mollica, C. and Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82 (2), pages 442--458, ISSN: 0033-3123, DOI: 10.1007/s11336-016-9530-0.

Ando, T. (2007). Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika, 94 (2), pages 443--458.

Raftery, A. E, Satagopan, J. M., Newton M. A. and Krivitsky, P. N. (2007). BAYESIAN STATISTICS 8. Proceedings of the eighth Valencia International Meeting 2006, pages 371--416. Oxford University Press.

Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian data analysis. Chapman & Hall/CRC, Second Edition, ISBN: 1-58488-388-X. New York.

Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (4), pages 583--639.

Author(s)

Cristina Mollica and Luca Tardella

  • Maintainer: Cristina Mollica
  • License: GPL (>= 2)
  • Last published: 2019-09-04

Useful links