Statistical Framework to Define Subgroups in Complex Datasets
Regional averages on a self-organizing map
Assign colors based on value
Mitigate data stratification
K-means clustering
Self-organizing map
Label pruning
Best-matching districts
Permutation analysis of map layout
Plot a self-organizing map
Standardization using existing parameters
Data cleaning and standardization
Estimate subgroup statistics
Train self-organizing map
Clean datasets
Create a self-organizing map
Self-organizing map statistics
Plot results from SOM analysis
Prepare datasets for analysis
Self-organizing map statistics
Interactive subgroup assignment
Summarize subgroup statistics
High-dimensional datasets that do not exhibit a clear intrinsic clustered structure pose a challenge to conventional clustering algorithms. For this reason, we developed an unsupervised framework that helps scientists to better subgroup their datasets based on visual cues, please see Gao S, Mutter S, Casey A, Makinen V-P (2019) Numero: a statistical framework to define multivariable subgroups in complex population-based datasets, Int J Epidemiology, 48:369-37, <doi:10.1093/ije/dyy113>. The framework includes the necessary functions to construct a self-organizing map of the data, to evaluate the statistical significance of the observed data patterns, and to visualize the results.