It is well understood that the ranking representation and ordering representation of ranking data can easily be confused. I thus use a S4 class to store all the information about the ranking data. This can avoid unnecessary confusion.
class
Details
It is possible to store both complete and top-q rankings in the same RankData object. Three slots topq, subobs, and q_ind are introduced for this purpose. Note that there is generally no need to specify these slots if your data set only contains a single "q" level (for example all data are top-10 rankings). The "q" level for complete ranking should be nobj-1. Moreover, if the rankings are organized in chunks of increasing "q" levels (for example, top-2 rankings followed by top-3 rankings followed by top-5 rankings etc.), then slots subobs, and q_ind can also be inferred correctly by the initializer. Therefore it is highly recommender that you organise the ranking matrix in this way and utilize the initializer.
Slots
nobj: The number of ranked objects. If not provided, it will be inferred as the maximum ranking in the data set. As a result, it must be provided if the data is top-q ranking.
nobs: the number of observations. No need to be provided during initialization since it must be equal to the sum of slot count.
ndistinct: the number of distinct rankings. No need to be provided during initialization since it must be equal to the number of rows of slot ranking.
ranking: a matrix that stores the ranking representation of distinct rankings. Each row contains one ranking. For top-q ranking, all unobserved objects have ranking q+1.
count: the number of observations for each distinct ranking corresponding to each row of ranking.
topq: a numeric vector to store top-q ranking information. More information in details section.
subobs: a numeric vector to store number of observations for each chunk of top-q rankings.
q_ind: a numeric vector to store the beginning and ending of each chunk of top-q rankings. The last element has to be ndistinct+1.
Examples
# creating a data set with only complete rankingsrankmat <- replicate(10,sample(1:52,52), simplify ="array")countvec <- sample(1:52,52,replace=TRUE)rankdat <- new("RankData",ranking=rankmat,count=countvec)# creating a data set with both complete and top-10 rankingsrankmat_in <- replicate(10,sample(1:52,52), simplify ="array")rankmat_in[rankmat_in>11]<-11rankmat_total <- cbind(rankmat_in, rankmat)countvec_total <- c(countvec,countvec)rankdat2 <- new("RankData",ranking=rankmat_total,count=countvec_total, nobj=52, topq=c(10,51))
References
Qian Z, Yu L. H. P (2019) "Weighted Distance-Based Models for Ranking Data Using the R Package rankdist." Journal of Statistical Software, 90 (5), 1-31. doi: 10.18637/jss.v090.i05