data_description() R function from [MSmix]

Descriptive summaries for partial rankings

Compute various data summaries for a partial ranking dataset. Differently from existing analogous functions supplied by other R packages, data_description supports partial observations with arbitrary patterns of censoring.

print method for class "data_descr".


data_description(
  rankings,
  marg = TRUE,
  borda_ord = FALSE,
  paired_comp = TRUE,
  subset = NULL,
  item_names = NULL
)

## S3 method for class 'data_descr'
print(x, ...)

Arguments

rankings: Integer $N$$x$$n$ matrix or data frame with partial rankings in each row. Missing positions must be coded as NA.
marg: Logical: whether the first-order marginals have to be computed. Defaults to TRUE.
borda_ord: Logical: whether, in the summary statistics, the items must be ordered according to the Borda ranking (i.e., mean rank vector). Defaults to FALSE.
paired_comp: Logical: whether the pairwise comparison matrix has to be computed. Defaults to TRUE.
subset: Optional logical or integer vector specifying the subset of observations, i.e. rows of rankings, to be kept. Missing values are taken as FALSE. Defaults to NULL meaning that all the rows are considered.
item_names: Character vector with the names to be used for the items. Defaults to NULL, meaning that colnames(rankings) is used and, if not available, item_names is set equal to "Item1","Item2",....
x: An object of class "data_descr" returned by data_description.
...: Further arguments passed to or from other methods (not used).

Returns

An object of class "data_descr", which is a list with the following named components:

n_ranked: Integer vector of length $N$ with the number of items ranked in each partial sequence.
n_ranked_distr: Frequency distribution of the n_ranked vector.
n_ranks_by_item: Integer $3$$x$$n$ matrix with the number of times that each item has been ranked or not. The last row contains the total by column, i.e. the sample size $N$ .
mean_rank: Mean rank vector.
borda_ordering: Character vector corresponding to the Borda ordering. This is obtained from the ranking of the mean rank vector.
marginals: Integer $n$$x$$n$ matrix of the first-order marginals in each column: the $(j,i)$ -th entry indicates the number of times that item $i$ is ranked in position $j$ .
pc: Integer $n$$x$$n$ pairwise comparison matrix: the $(i,i')$ -th entry indicates the number of times that item $i$ is preferred to item $i'$ .
rankings: When borda_ord = TRUE, an integer $N$$x$$n$ matrix corresponding to rankings with columns rearranged according to the Borda ordering, otherwise the input rankings.

Details

The implementation of data_description is similar to that of rank_summaries from the PLMIX package. Differently from the latter, data_description works with any kind of partial rankings (not only top rankings) and allows to summarize subsamples thanks to the additional subset argument.

The Borda ranking, obtained from the ordering of the mean rank vector, corresponds to the MLE of the consensus ranking of the Mallows model with Spearman distance. If mean_rank contains some NAs, the corresponding items occupy the bottom positions in the borda_ordering according to the order they appear in item_names.

Examples


## Example 1. Sample statistics for the Antifragility dataset.
r_antifrag <- ranks_antifragility[, 1:7]
descr <- data_description(rankings = r_antifrag)
descr

## Example 2. Sample statistics for the Sports dataset.
r_sports <- ranks_sports[, 1:8]
descr <- data_description(rankings = r_sports, borda_ord = TRUE)
descr

## Example 3. Sample statistics for the Sports dataset by gender.
r_sports <- ranks_sports[, 1:8]
desc_f <- data_description(rankings = r_sports, subset = (ranks_sports$Gender == "Female"))
desc_m <- data_description(rankings = r_sports, subset = (ranks_sports$Gender == "Male"))
desc_f
desc_m

References

Mollica C and Tardella L (2020). PLMIX: An R package for modelling and clustering partially ranked data. Journal of Statistical Computation and Simulation, 90 (5), pages 925--959, ISSN: 0094-9655, DOI: 10.1080/00949655.2020.1711909.

Marden JI (1995). Analyzing and modeling rank data. Monographs on Statistics and Applied Probability (64). Chapman & Hall, ISSN: 0-412-99521-2. London.

data_description function