KMeansPP() R function from [TreeDist]

k-means++ clustering

k-means++ clustering \insertCite Arthur2007TreeDist improves the speed and accuracy of standard kmeans clustering \insertCite Hartigan1979TreeDist by preferring initial cluster centres that are far from others. A scalable version of the algorithm has been proposed for larger data sets \insertCite Bahmani2012TreeDist, but is not implemented here.


KMeansPP(x, k = 2, nstart = 10, ...)

Arguments

x: Numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
k: Integer specifying the number of clusters, k.
nstart: Positive integer specifying how many random sets should be chosen
...: additional arguments passed to kmeans

Examples


# Generate random points
set.seed(1)
x <- cbind(c(rnorm(10, -5), rnorm(5, 1), rnorm(10, 6)),
           c(rnorm(5, 0), rnorm(15, 4), rnorm(5, 0)))

# Conventional k-means may perform poorly
klusters <- kmeans(x, cent = 5)
plot(x, col = klusters$cluster, pch = rep(15:19, each = 5))

# Here, k-means++ recovers a better clustering
plusters <- KMeansPP(x, k = 5)
plot(x, col = plusters$cluster, pch = rep(15:19, each = 5))

References

\insertAllCited

Author(s)

Martin R. Smith

(martin.smith@durham.ac.uk)

TreeDist package Read PDF manual

Maintainer: Martin R. Smith
License: GPL (>= 3)
Last published: 2025-01-11

Useful links

KMeansPP function

k-means++ clustering

Arguments

Examples

References

See Also

Author(s)