Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model
Parallel Model-Based Clustering
Parallel Model-Based Clustering and Parallel K-means Algorithm
Read Me First Function
Set Global Variables According to the global matrix X.gbd (X.spmd)
A Set of Parameters in Model-Based Clustering.
A Set of Controls in Model-Based Clustering.
Obtain a Set of Random Samples for X.spmd
Compute One E-step and Log Likelihood Based on Current Parameters
Compute One M-Step Based on Current Posterior Probabilities
One EM Step for GBD
Initialization for EM-like Algorithms
EM-like Steps for GBD
Generate Examples for Testing
Generate MixSim Examples for Testing
Obtain Total Elements for Every Clusters
Independent Function for Log Likelihood
Print Results of Model-Based Clustering
Update CLASS.spmd Based on the Final Iteration
Functions for Printing or Summarizing Objects According to Classes
All Internal Functions
Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs 'pbdMPI' to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through 'pbdMPI' and MPI' implementations such as 'OpenMPI' and 'MPICH'. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.