Missclasification is a commonly used performance measure in subspace clustering. It allows to compare two partitions with the same number of clusters.
misclassification(group, true_group, M, K)
Arguments
group: A vector, first partition.
true_group: A vector, second (reference) partition.
M: An integer, maximal number of elements in one class.
K: An integer, number of classes.
Returns
Misclassification rate.
Details
As getting exact value of misclassification requires checking all permutations and is therefore intrackable even for modest number of clusters, a heuristic approach is proposed. It is assumed that there are K classes of maximum M elements. Additional requirement is that classes labels are from range [1, K].
Examples
sim.data <- data.simulation(n =100, SNR =1, K =5, numb.vars =30, max.dim =2)mlcc.fit <- mlcc.reps(sim.data$X, numb.clusters =5, numb.runs =20, max.dim =2, numb.cores=1)misclassification(mlcc.fit$segmentation,sim.data$s,30,5)#one can use this function not only for clusterspartition1 <- sample(10,300, replace =TRUE)partition2 <- sample(10,300, replace =TRUE)misclassification(partition1, partition1, max(table(partition1)),10)misclassification(partition1, partition2, max(table(partition2)),10)
References
R. Vidal. Subspace clustering. Signal Processing Magazine, IEEE, 28(2):52-68,2011