mselect_adproclus function

Model selection helper for ADPROCLUS

Model selection helper for ADPROCLUS

Performs ADPROCLUS for the number of clusters from min_nclusters to max_nclusters. This replaces the need to manually estimate multiple models to select the best number of clusters and returns the results in a format compatible with plot_scree_adpc to obtain a scree plot. Output is also compatible with select_by_CHull to automatically select a suitable number of clusters. The compatibility with both functions is only given if return_models = FALSE.

mselect_adproclus( data, min_nclusters, max_nclusters, return_models = FALSE, unexplvar = TRUE, start_allocation = NULL, nrandomstart = 1, nsemirandomstart = 1, algorithm = "ALS2", save_all_starts = FALSE, seed = NULL )

Arguments

  • data: Object-by-variable data matrix of class matrix or data.frame.
  • min_nclusters: Minimum number of clusters to estimate.
  • max_nclusters: Maximum number of clusters to estimate.
  • return_models: Boolean. If FALSE a vector of model fit scores is returned, which is compatible with the plot_scree_adpc function. If TRUE the list of actually estimated models is returned.
  • unexplvar: Boolean. If TRUE the model fit is specified in terms of unexplained variance. Otherwise it will be specified in terms of Sum of Squared Errors (SSE). This propagates through to the scree plots.
  • start_allocation: Optional starting cluster membership matrix to be passed to the ADPROCLUS procedure. See get_rational for more information.
  • nrandomstart: Number of random starts computed for each model.
  • nsemirandomstart: Number of semi-random starts computed for each model.
  • algorithm: Character string "ALS1" or "ALS2" (default), denoting the type of alternating least squares algorithm. Can be abbreviated with "1" or "2".
  • save_all_starts: Logical. If TRUE and return_models = TRUE, the results of all algorithm starts are returned. By default, only the best solution is retained.
  • seed: Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility.

Returns

Matrix with one column of SSE or unexplained variance scores for all estimated models. Row names are the value of the cluster parameter for the relevant model. Depends on the choice of return_models. If TRUE a list of estimated models is returned.

Examples

# Loading a test dataset into the global environment x <- stackloss # Estimating models with cluster parameter values ranging from 1 to 4 model_fits <- mselect_adproclus(data = x, min_nclusters = 1, max_nclusters = 4, seed = 10) # Plot the results as a scree plot to select the appropriate number of clusters plot_scree_adpc(model_fits)

See Also

  • adproclus: for the actual ADPROCLUS procedure
  • plot_scree_adpc: for plotting the model fits
  • select_by_CHull: for automatic model selection via CHull method
  • Maintainer: Henry Heppe
  • License: GPL (>= 3)
  • Last published: 2024-08-17