Automatic Model Selection for ADPROCLUS with CHull Method
Automatic Model Selection for ADPROCLUS with CHull Method
For a set of full dimensional ADPROCLUS models (each with different number of clusters), this function finds the "elbow" in the scree plot by using the CHull procedure (Wilderjans, Ceuleman & Meers, 2013) implemented in the multichull package. For a matrix of low dimensional ADPROCLUS models (each with different number of cluster and components), this function finds the "elbow" in the scree plot for each number of clusters with the CHull methods. That is, it reduces the number of model to choose from to the number of different cluster parameter values by choosing the "elbow" number of components for a given number of clusters. The resulting list can in turn be visualized with plot_scree_adpc_preselected. For this procedure to work, the SSE or unexplained variance values must be decreasing in the number of clusters (components). If that is not the case increasing the number of (semi-) random starts can help.
model_fit: Matrix containing SSEs or unexplained variance of all models as in the output of mselect_adproclus or mselect_adproclus_low_dim.
percentage_fit: Required proportion of increase in fit of a more complex model.
...: Additional parameters to be passed on to multichull::CHull() function.
Returns
For full dimensional ADPROCLUS a CHull object describing the chosen model. For low dimensional ADPROCLUS a matrix containing the list of chosen models and the relevant model parameter, compatible with plot_scree_adpc_preselected.
Details
This procedure cannot choose the model with the largest or smallest number of clusters (components), i.e. for a set of three models it will always choose the middle one. If for a given number of clusters exactly two models were estimated, this function chooses the model with the lower SSE/unexplained variance.
The name of the model fit criterion is propagated from the input matrix based on the first column name. It is either "SSE" or "Unexplained_Variance".
Examples
# Loading a test dataset into the global environmentx <- stackloss
# Estimating models with cluster parameter values ranging from 1 to 4model_fits <- mselect_adproclus(data = x, min_nclusters =1, max_nclusters =4)# Use and visualize CHull methodselected_model <- select_by_CHull(model_fits)selected_model
plot(selected_model)# Estimating low dimensional models with cluster parameter values# ranging from 1 to 4 and component parameter values also ranging from 1 to 4model_fits <- mselect_adproclus_low_dim(data = x,1,4,1,4, nsemirandomstart =10, seed =1)# Using the CHull methodpre_selection <- select_by_CHull(model_fits)# Visualize pre-selected modelsplot_scree_adpc_preselected(pre_selection)
References
Wilderjans, T. F., Ceulemans, E., & Meers, K. (2012). CHull: A generic convex hull based model selection method. Behavior Research Methods, 45, 1-15
See Also
mselect_adproclus: to obtain the model_fit input from the possible ADPROCLUS models
mselect_adproclus_low_dim: to obtain the model_fit input from the possible low dimensional ADPROCLUS models