optimal_phyloregion() R function from [phyloregion]

Determine optimal number of clusters

This function divides the hierarchical dendrogram into meaningful clusters ("phyloregions"), based on the ‘elbow’ or ‘knee’ of an evaluation graph that corresponds to the point of optimal curvature.


optimal_phyloregion(x, method = "average", k = 20)

Arguments

x: a numeric matrix, data frame or dist object.
method: the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of ward.D , ward.D2 , single , complete , average (= UPGMA), mcquitty (= WPGMA), median (= WPGMC) or centroid (= UPGMC).
k: numeric, the upper bound of the number of clusters to compute. DEFAULT: 20 or the number of observations (if less than 20).

Returns

a list containing the following as returned from the GMD package (Zhao et al. 2011):

k: optimal number of clusters (bioregions)
totbss: total between-cluster sum-of-square
tss: total sum of squares of the data
ev: explained variance given k

Examples


data(africa)
tree <- africa$phylo
bc <- beta_diss(africa$comm)
(d <- optimal_phyloregion(bc[[1]], k=15))
plot(d$df$k, d$df$ev, ylab = "Explained variances",
  xlab = "Number of clusters")
lines(d$df$k[order(d$df$k)], d$df$ev[order(d$df$k)], pch = 1)
points(d$optimal$k, d$optimal$ev, pch = 21, bg = "red", cex = 3)
points(d$optimal$k, d$optimal$ev, pch = 21, bg = "red", type = "h")

References

Salvador, S. & Chan, P. (2004) Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. Proceedings of the Sixteenth IEEE International Conference on Tools with Artificial Intelligence, pp. 576–584. Institute of Electrical and Electronics Engineers, Piscataway, New Jersey, USA.

Zhao, X., Valen, E., Parker, B.J. & Sandelin, A. (2011) Systematic clustering of transcription start site landscapes. PLoS ONE 6 : e23409.

phyloregion package Read PDF manual

Maintainer: Barnabas H. Daru
License: AGPL-3
Last published: 2023-08-15

Useful links

optimal_phyloregion function

Determine optimal number of clusters

Arguments

Returns

Examples

References