This function divides the hierarchical dendrogram into meaningful clusters ("phyloregions"), based on the ‘elbow’ or ‘knee’ of an evaluation graph that corresponds to the point of optimal curvature.
optimal_phyloregion(x, method ="average", k =20)
Arguments
x: a numeric matrix, data frame or dist object.
method: the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of ward.D , ward.D2 , single , complete , average (= UPGMA), mcquitty (= WPGMA), median (= WPGMC) or centroid (= UPGMC).
k: numeric, the upper bound of the number of clusters to compute. DEFAULT: 20 or the number of observations (if less than 20).
Returns
a list containing the following as returned from the GMD package (Zhao et al. 2011):
Salvador, S. & Chan, P. (2004) Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. Proceedings of the Sixteenth IEEE International Conference on Tools with Artificial Intelligence, pp. 576–584. Institute of Electrical and Electronics Engineers, Piscataway, New Jersey, USA.
Zhao, X., Valen, E., Parker, B.J. & Sandelin, A. (2011) Systematic clustering of transcription start site landscapes. PLoS ONE 6 : e23409.