DBS is a flexible and robust clustering framework that consists of three independent modules. The first module is the parameter-free projection method Pswarm Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors GeneratePswarmVisualization, called the generalized U-matrix. The third module is a clustering method with no sensitive parameters DBSclustering. The clustering can be verified by the visualization and vice versa. The term DBS refers to the method as a whole.
The GeneratePswarmVisualization function generates the special case (please see [Thrun, 2018]) of the generalized Umatrix with the help of an unsupervised neural network (simplified emergent self-organizing map published in [Thrun/Ultsch, 2020]). From the generalized Umatrix a topographic map with hypsometric tints can be visualized. To see this visualization use plotTopographicMap of the package GeneralizedUmatrix.
Data: [1:n,1:d] array of data: n cases in rows, d variables in columns
ProjectedPoints: matrix, ProjectedPoints[1:n,1:2] n by 2 matrix containing coordinates of the Projection: A matrix of the fitted configuration. see output of Pswarm for further details
LC: size of the grid c(Lines,Columns), number of Lines and Columns automatic calculated by setGridSize in Pswarm
Sometimes is better to choose a different grid size, e.g. to to reduce computional effort contrary to SOM, here the grid size defined only the resolution of the visualizations The real grid size is predfined by Pswarm, but you may choose a factor x*res$LC if you so desire. Therefore, The resulting grid size is given back in the Output.
PlotIt: Optional, default(FALSE), If TRUE than uses plotTopographicMap of the package GeneralizedUmatrix is plotted as a topview in the tiled option, see details for explanation.
ComputeInR: Optional, =TRUE: Rcode, =FALSE C++ implementation
Parallel: Optional, =TRUE: Parallel C++ implementation, =FALSE Sequential C++ implementation
Tiled: Optional, =TRUE: arrangement of four grids for better understanding of edge behaviour, =FALSE: single grid.
DataPerEpoch: Optional: Number between 0 and 1 stating the ratio of data per epoch for training of the generalized u-matrix approach.
Details
Tiled: The topographic map is visualized 4 times because the projection is toroidal. The reason is that there are no border in the visualizations and clusters (if they exist) are not disrupted by borders of the plot.
If you used Pswarm with distance matrix instead of a data matrix (in the sense that you do not have any data matrix available), you may transform your distances into data by using MDS of the ProjectionBasedClustering package in order to use the GeneratePswarmVisualization function. The correct dimension can be found through the Sheppard diagram or kruskals stress.
References
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, tools:::Rd_expr_doi("10.1007/978-3-658-20540-9") , 2018.
[Thrun/Ultsch, 2020] Thrun, M. C., & Ultsch, A.: Uncovering High-Dimensional Structures of Projections from Dimensionality Reduction Methods, MethodsX, Vol. 7, pp. 101093, tools:::Rd_expr_doi("10.1016/j.mex.2020.101093") , 2020.
Note
If you used pswarm with distance matrix instead of a data matrix you can mds transform your distances into data (see the MDS function of the ProjectionBasedClustering package.). The correct dimension can be found through the Sheppard diagram or kruskals stress.
Returns
list of - Bestmatches: matrix [1:n,1:2], BestMatches of the Umatrix, contrary to ESOM they are always fixed, because predefined by GridPoints.
Umatrix: matrix [1:Lines,1:Columns],
WeightsOfNeurons: array [1:Lines,1:Columns,1:d], d is the dimension of the weights, the same as in the ESOM algorithm
GridPoints: matrix [1:n,1:2],quantized projected points: projected points now lie on a predefined grid.
LC: c(Lines,Columns), normally equal to grid size of Pswarm, sometimes it a better or a lower resolution for the visualization is better. Therefore here the grid size of the neurons is given back.
PlotlyHandle: If PlotIt=FALSE: NULL, otherwise plotly object for ploting topview of topographic map
Author(s)
Michael Thrun
Note
The extraction of an island out of the generalized Umatrix can be performed using the interactiveGeneralizedUmatrixIsland function in the package ProjectionBasedClustering.
The main code of both functions GeneralizedUmatrix and GeneratePswarmVisualization is the same C++ function sESOM4BMUs which is described in [Thrun/Ultsch, 2020].
See Also
Pswarm and plotTopographicMap and GeneralizedUmatrix of the package GeneralizedUmatrix