mselect_adproclus_low_dim function

Model selection helper for low dimensional ADPROCLUS

Model selection helper for low dimensional ADPROCLUS

Performs low dimensional ADPROCLUS for the number of clusters from min_nclusters to max_nclusters and the number of components from min_ncomponents to max_ncomponents. This replaces the need to manually estimate multiple models to select the best number of clusters and components and returns the results in a format compatible with plot_scree_adpc to obtain a scree plot / multiple scree plots. Output is also compatible with select_by_CHull to automatically select a suitable number of components for each number of clusters. The compatibility with both functions is only given if return_models = FALSE.

mselect_adproclus_low_dim( data, min_nclusters, max_nclusters, min_ncomponents, max_ncomponents, return_models = FALSE, unexplvar = TRUE, start_allocation = NULL, nrandomstart = 1, nsemirandomstart = 1, save_all_starts = FALSE, seed = NULL )

Arguments

  • data: Object-by-variable data matrix of class matrix or data.frame.
  • min_nclusters: Minimum number of clusters to estimate.
  • max_nclusters: Maximum number of clusters to estimate.
  • min_ncomponents: Minimum number of components to estimate. Must be smaller or equal than min_nclusters.
  • max_ncomponents: Maximum number of components to estimate. Must be smaller or equal than max_nclusters.
  • return_models: Boolean. If FALSE a matrix of model fit scores is returned, which is compatible with the plot_scree_adpc function. If TRUE the list of actually estimated models is returned.
  • unexplvar: Boolean. If TRUE the model fit is specified in terms of unexplained variance. Otherwise it will be specified in terms of Sum of Squared Errors (SSE). This propagates through to the scree plots.
  • start_allocation: Optional starting cluster membership matrix to be passed to the low dimensional ADPROCLUS procedure. See get_rational for more information.
  • nrandomstart: Number of random starts computed for each model.
  • nsemirandomstart: Number of semi-random starts computed for each model.
  • save_all_starts: Logical. If TRUE and return_models = TRUE, the results of all algorithm starts are returned. By default, only the best solution is retained.
  • seed: Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility.

Returns

Number of clusters by number of components matrix where the values are SSE or unexplained variance scores for all estimated models. Row names are the value of the cluster parameter for the relevant model. Column names contain the value of the components parameter. Depends on the choice of return_models. If TRUE a list of estimated models is returned.

Examples

# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 # and component parameter values also ranging from 1 to 4 model_fits <- mselect_adproclus_low_dim(data = x, 1, 4, 1, 4, seed = 1) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)

See Also

  • adproclus_low_dim: for the actual low dimensional ADPROCLUS procedure
  • plot_scree_adpc: for plotting the model fits
  • select_by_CHull: for automatic model selection via CHull method
  • Maintainer: Henry Heppe
  • License: GPL (>= 3)
  • Last published: 2024-08-17