A greedy algorithm to solve the max-p-region problem
A greedy algorithm to solve the max-p-region problem
The max-p-region problem is a special case of constrained clustering where a finite number of geographical areas are aggregated into the maximum number of regions (max-p-regions), such that each region is geographically connected and the clusters could maximize internal homogeneity.
df: A data frame with selected variables only. E.g. guerry[c("Crm_prs", "Crm_prp", "Litercy")]
bound_variable: A numeric vector of selected bounding variable
min_bound: A minimum value that the sum value of bounding variable int each cluster should be greater than
iterations: (optional): The number of iterations of greedy algorithm. Defaults to 99.
initial_regions: (optional): The initial regions that the local search starts with. Default is empty. means the local search starts with a random process to "grow" clusters
scale_method: (optional) One of the scaling methods ('raw', 'standardize', 'demean', 'mad', 'range_standardize', 'range_adjust') to apply on input data. Default is 'standardize' (Z-score normalization).
distance_method: (optional) The distance method used to compute the distance betwen observation i and j. Defaults to "euclidean". Options are "euclidean" and "manhattan"
random_seed: (optional) The seed for random number generator. Defaults to 123456789.
cpu_threads: (optional) The number of cpu threads used for parallel computation
A names list with names "Clusters", "Total sum of squares", "Within-cluster sum of squares", "Total within-cluster sum of squares", and "The ratio of between to total sum of squares".