relabel() R function from [mcclust]

Stephens' Relabelling Algorithm for Clusterings

For a sample of clusterings in which corresponding clusters have different labels the algorithm attempts to bring the clusterings to a unique labelling.


relabel(cls, print.loss = TRUE)

Arguments

cls: a matrix in which every row corresponds to a clustering of the ncol(cls) objects.
print.loss: logical, should current value of loss function be printed after each iteration? Defaults to TRUE.

Details

The algorithm minimizes the loss function

\sum_{m=1}^M\sum_{i=1}^n\sum_{j=1}^K-\log\hat{p}_{ij} \cdot I_{\{z_i^{(m)}=j\}}sum_{m=1}^M sum_{i=1}^n sum_{j=1}^K -log(hat{p}_{ij}) * I_{z_i^(m)=j}

over the $M$ clusterings, $n$ observations and $K$ clusters, where $hat{p}_{ij}$ is the estimated probability that observation $i$ belongs to cluster $j$ and $z_i^(m)$ indicates to which cluster observation $i$ belongs in clustering $m$ . $I_{.}$ is an indicator function.

Minimization is achieved by iterating the estimation of $hat{p}_{ij}$ over all clusterings and the minimization of the loss function in each clustering by permuting the cluster labels. The latter is done by linear programming.

Returns

cls: the input cls with unified labelling.
P: an $n*K$ matrix, where entry $[i,j]$ contains the estimated probability that observation $i$ belongs to cluster $j$ .
loss.val: value of the loss function.
cl: vector of cluster memberships that have the highest probabilities $\hat{p}_{ij}$ .

References

Stephens, M. (2000) Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B, 62 , 795--809.

Author(s)

Arno Fritsch, arno.fritsch@tu-dortmund.de

Note

The implementation is a variant of the algorithm of Stephens which is originally applied to draws of parameters for each observation, not to cluster labels.

Warning

The algorithm assumes that the number of clusters $K$ is fixed. If this is not the case $K$ is taken to be the most common number of clusters. Clusterings with other numbers of clusters are discarded and a warning is issued.

Examples


(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1)))
# group 2 in clustering 4 corresponds to group 1 in clustering 1-3.
cls.relab <- relabel(cls)
cls.relab$cls

mcclust package Read PDF manual

Maintainer: Arno Fritsch
License: GPL (>= 2)
Last published: 2022-05-02

Useful links

relabel function