k-means++ clustering \insertCite Arthur2007TreeDist improves the speed and accuracy of standard kmeans clustering \insertCite Hartigan1979TreeDist by preferring initial cluster centres that are far from others. A scalable version of the algorithm has been proposed for larger data sets \insertCite Bahmani2012TreeDist, but is not implemented here.
KMeansPP(x, k =2, nstart =10,...)
Arguments
x: Numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
k: Integer specifying the number of clusters, k.
nstart: Positive integer specifying how many random sets should be chosen
...: additional arguments passed to kmeans
Examples
# Generate random pointsset.seed(1)x <- cbind(c(rnorm(10,-5), rnorm(5,1), rnorm(10,6)), c(rnorm(5,0), rnorm(15,4), rnorm(5,0)))# Conventional k-means may perform poorlyklusters <- kmeans(x, cent =5)plot(x, col = klusters$cluster, pch = rep(15:19, each =5))# Here, k-means++ recovers a better clusteringplusters <- KMeansPP(x, k =5)plot(x, col = plusters$cluster, pch = rep(15:19, each =5))