data_censoring function

Censoring of full rankings

Censoring of full rankings

Convert full rankings into either top-k rankings or into partial rankings with missing data in arbitrary positions.

data_censoring( rankings, topk = TRUE, nranked = NULL, probs = rep(1, ncol(rankings) - 1) )

Arguments

  • rankings: Integer N$$x$$n matrix or data frame with full rankings in each row.
  • topk: Logical: whether the full rankings must be converted into top-k rankings (TRUE) or into partial rankings with missing data in arbitrary positions (FALSE). Defaults to TRUE.
  • nranked: Integer vector of length NN with the desired number of positions to be retained in each partial sequence after censoring. If nranked = NULL (default), the number of positions are randomly generated according to the probabilities in the probs argument.
  • probs: Numeric vector of the (n1)(n-1) probabilities for the random generation of the number of positions to be retained in each partial sequence after censoring (normalization is not necessary). Used only if nranked = NULL. Defaults to equal probabilities.

Returns

A list of two named objects:

  • part_rankings: Integer N$$x$$n matrix with partial (censored) rankings in each row. Missing positions are coded as NA.
  • nranked: Integer vector of length NN with the actual number of items ranked in each partial sequence after censoring.

Details

Both forms of partial rankings can be obtained into two ways: (i) by specifying, in the nranked argument, the number of positions to be retained in each partial ranking; (ii) by setting nranked = NULL (default) and specifying, in the probs argument, the probabilities of retaining respectively 1,2,...,(n1)1, 2, ..., (n-1) positions in the partial rankings (recall that a partial sequence with (n1)(n-1) observed entries corresponds to a full ranking).

When topk = FALSE, the exact positions that must be retained into the partial sequences after censoring are uniformly generated, regardless of the specification of the nranked argument.

Examples

## Example 1. Censoring the Antifragility dataset into partial top rankings # Top-3 censoring (assigned number of top positions to be retained) n <- 7 r_antifrag <- ranks_antifragility[, 1:n] data_censoring(r_antifrag, topk = TRUE, nranked = rep(3,nrow(r_antifrag))) # Random top-k censoring with assigned probabilities set.seed(12345) data_censoring(r_antifrag, topk = TRUE, probs = 1:(n-1)) ## Example 2. Simulate full rankings from a basic Mallows model with Spearman distance n <- 10 N <- 100 set.seed(12345) rankings <- rMSmix(sample_size = N, n_items = n)$samples # Censoring in arbitrary positions with assigned number of ranks to be retained set.seed(12345) nranked <- round(runif(N,0.5,1)*n) set.seed(12345) arbitr_ranks1 <- data_censoring(rankings, topk = FALSE, nranked = nranked) arbitr_ranks1 identical(arbitr_ranks1$nranked, nranked) # Censoring in arbitrary positions with random number of ranks to be retained set.seed(12345) probs <- runif(n-1, 0, 0.5) set.seed(12345) arbitr_ranks2 <- data_censoring(rankings, topk = FALSE, probs = probs) arbitr_ranks2 prop.table(table(arbitr_ranks2$nranked)) round(prop.table(probs), 2)
  • Maintainer: Cristina Mollica
  • License: GPL (>= 3)
  • Last published: 2025-03-25

Useful links