DBSclustering function

Databonic swarm clustering (DBS)

Databonic swarm clustering (DBS)

DBS is a flexible and robust clustering framework that consists of three independent modules. The first module is the parameter-free projection method Pswarm Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations [Thrun/Ultsch, 2021]. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors GeneratePswarmVisualization, called the generalized U-matrix. The third module is a clustering method with no sensitive parameters DBSclustering (see [Thrun, 2018, p. 104 ff]). The clustering can be verified by the visualization and vice versa. The term DBS refers to the method as a whole.

The DBSclustering function applies the automated Clustering approach of the Databonic swarm using abstract U distances, which are the geodesic distances based on high-dimensional distances combined with low dimensional graph paths by using ShortestGraphPathsC.

DBSclustering(k, DataOrDistance, BestMatches, LC, StructureType = TRUE, PlotIt = FALSE, ylab,main, method = "euclidean",...)

Arguments

  • k: number of clusters, how many to you see in the topographic map (3D landscape)?
  • DataOrDistance: Either [1:n,1:d] Matrix of Data (n cases, d dimensions) that will be used. One DataPoint per row or symmetric Distance matrix [1:n,1:n]
  • BestMatches: [1:n,1:2] Matrix with positions of Bestmatches or ProjectedPoints, one matrix line per data point
  • LC: grid size c(Lines,Columns), please see details
  • StructureType: Optional, bool; = TRUE: compact structure of clusters assumed, =FALSE: connected structure of clusters assumed. For the two options for Clusters, see [Thrun, 2018] or Handl et al. 2006
  • PlotIt: Optional, bool, Plots Dendrogramm
  • ylab: Optional, character vector, ylabel of dendrogramm
  • main: Optional, character vctor, title of dendrogramm
  • method: Optional, one of 39 distance methods of parDist of package parallelDist, if Data matrix is chosen above
  • ...: Further arguments passed on to the parDist function, e.g. user-defined distance functions

Details

The input of the LC parameter depends on the choice of Bestmatches

input argument. Usually as the name of the argument states, the Bestmatches of the GeneratePswarmVisualization function are used which is define in the notation of self-organizing map. In this case please see example one.

However, as written above, clustering and visualization can be applied independently of each other. In this case the places of Lines L and Columns C are switched because Lines is a value slightly above the maximum of the x-coordinates and Columns is a value slightly above the maximum of the y-coordinates of ProjectedPoint. Hence, one should give DBSclustering the argument LC as shown in example 2.

Often it is better to mark the outliers manually after the prozess of clustering and sometimes a clustering can be improved through human interaction [Thrun/Ultsch,2017] DOI:10.13140/RG.2.2.13124.53124; use in this case the visualization plotTopographicMap of the package GeneralizedUmatrix. If you would like to mark the outliers interactivly in the visualization use the ProjectionBasedClustering package with the function interactiveClustering(), or for full interactive clustering IPBC(). The package is available on CRAN. An example is shown in case of interactiveClustering() function in the third example.

Returns

[1:n] numerical vector of numbers defining the classification as the main output of this cluster analysis for the n cases of data corresponding to the n bestmatches. It has k unique numbers representing the arbitrary labels of the clustering. You can use plotTopographicMap(Umatrix,Bestmatches,Cls) for verification.

References

[Thrun/Ultsch, 2021] Thrun, M. C., and Ultsch, A.: Swarm Intelligence for Self-Organized Clustering, Artificial Intelligence, Vol. 290, pp. 103237, tools:::Rd_expr_doi("10.1016/j.artint.2020.103237") , 2021.

Author(s)

Michael Thrun

Note

If you want to verifiy your clustering result externally, you can use Heatmap or SilhouettePlot of the package DataVisualizations

available on CRAN.

Examples

data("Lsun3D") Data=Lsun3D$Data InputDistances=as.matrix(dist(Data)) projection=Pswarm(InputDistances) ## Example One genUmatrixList=GeneratePswarmVisualization(Data, projection$ProjectedPoints,projection$LC) Cls=DBSclustering(k=3, Data, genUmatrixList$Bestmatches, genUmatrixList$LC,PlotIt=TRUE) ## Example Two #automatic Clustering without GeneralizedUmatrix visualization Cls=DBSclustering(k=3, Data, projection$ProjectedPoints,projection$LC, PlotIt=TRUE) ## Not run: ## Example Three ## Sometimes an automatic Clustering can be improved ## through an interactive approach, ## e.g. if Outliers exist (see [Thrun/Ultsch, 2017]) library(ProjectionBasedClustering) Cls2=ProjectionBasedClustering::interactiveClustering(genUmatrixList$Umatrix, genUmatrixList$Bestmatches, Cls) ## End(Not run)
  • Maintainer: Michael Thrun
  • License: GPL-3
  • Last published: 2024-06-20