For a sample of clusterings in which corresponding clusters have different labels the algorithm attempts to bring the clusterings to a unique labelling.
relabel(cls, print.loss =TRUE)
Arguments
cls: a matrix in which every row corresponds to a clustering of the ncol(cls) objects.
print.loss: logical, should current value of loss function be printed after each iteration? Defaults to TRUE.
over the M clusterings, n observations and K clusters, where hatpij is the estimated probability that observation i belongs to cluster j and zi(m) indicates to which cluster observation i belongs in clustering m. I. is an indicator function.
Minimization is achieved by iterating the estimation of hatpij over all clusterings and the minimization of the loss function in each clustering by permuting the cluster labels. The latter is done by linear programming.
Returns
cls: the input cls with unified labelling.
P: an n∗K matrix, where entry [i,j] contains the estimated probability that observation i belongs to cluster j.
loss.val: value of the loss function.
cl: vector of cluster memberships that have the highest probabilities p^ij.
References
Stephens, M. (2000) Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B, 62 , 795--809.
The implementation is a variant of the algorithm of Stephens which is originally applied to draws of parameters for each observation, not to cluster labels.
Warning
The algorithm assumes that the number of clusters K is fixed. If this is not the case K is taken to be the most common number of clusters. Clusterings with other numbers of clusters are discarded and a warning is issued.
See Also
lp.transport for the linear programming, maxpear, minbinder, medv
for other possibilities of processing a sample of clusterings.
Examples
(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1)))# group 2 in clustering 4 corresponds to group 1 in clustering 1-3.cls.relab <- relabel(cls)cls.relab$cls