clusterSP function

Cluster snow profiles

Cluster snow profiles

This function is the main gateway to sarp.snowprofile::snowprofile clustering.

clusterSP( SPx = NULL, k = 2, type = c("hclust", "pam", "fanny", "kdba", "fast")[1], distmat = NULL, config = clusterSPconfig(type), centers = "none", keepSPx = TRUE, keepDistmat = TRUE )

Arguments

  • SPx: a sarp.snowprofile::snowprofileSet to be clustered

  • k: number of desired cluster numbers

  • type: clustering type including hclust (default), pam, fanny, kdba and fast

  • distmat: a precomputed distance matrix of class dist. This results in much faster clustering for type %in% c('hclust', 'pam', 'fanny')

    as well as faster identification of medoid profiles if centers %in% c('medoids', 'both')

  • config: a list providing the necessary hyperparameters. Use clusterSPconfig functions for convenience!

  • centers: compute and return mediods, centroids, both, or none for each cluster. default 'none' will only return centroids/medoids if they were already calculated with the clustering algorithm, whereas other options could result in extra processing time to calculate additional centroids/medoids

  • keepSPx: append the snowprofileSet to the output?

  • keepDistmat: append the distmat to the output?

Returns

a list of class clusterSP containing:

  • clustering: vector of integers (from 1:k) indicating the cluster to which each point is allocated
  • id.med: vector of indices for the medoid profiles of each cluster (if calculated)
  • centroids: snowprofileSet containing the centroid profile for each cluster (if calculated)
  • tree: object of class 'hclust' describing the tree output by hclust
  • ...: all other outputs provided by the clustering algorithms (e.g., a membership matrix from fanny.object, pam.object, iteration history from clusterSPkdba )
  • type: type of clustering as provided by input argument
  • call: a copy of the clusterSP function call
  • SPx: a copy of the input snowprofileSet (if keepSPx = TRUE)
  • distmat: the pairwise distance matrix of class dist (if keepDistmat = TRUE and a matrix has been provided or computed)

Details

There are several clustering approaches that can be applied to snow profiles. Most rely on computing a pairwise distance matrix between all profiles in a snowprofileSet. Current implementations with this approach rely on existing R functions:

  • agglomerative hierarchical clustering stats::hclust
  • partitioning around medoids cluster::pam
  • fuzzy analysis clustering cluster::fanny

Since computing a pairwise distance matrix matrix can be slow, the recommended way of testing different number of clusters kk is precomputing a single distance matrix with the distanceSP function and providing it as an argument to clusterSP.

An alternate type of clustering known a k-dimensional barycentric averaging kdba is conceptually similar to kmeans but specifically adapted to snow profiles clusterSPkdba . That means that an initial clustering condition (which can be random or based on a 'sophisticated guess') is iteratively refined by assigning individual profiles to the most similar cluster and at the end of every iteration recomputing the cluster centroids. The cluster centroids are represented by the average snow profile of each cluster (see averageSP ). Note that the results of kdba are sensitive to the initial conditions, which by default are estimated with the 'fast' method below.

And finally, a much faster 'fast' method is available that computes a pairwise distance matrix without aligning profiles, but instead based on summary statistics such as snow height, height of new snow, presence or absence of weak layers and crusts, etc. The 'fast' clustering approach uses the partitioning around medoids clustering approach with the 'fast' distance matrix.

More details here...

Examples

this_example_runs_too_long <- TRUE if (!this_example_runs_too_long) { # exclude from cran checks ## Cluster with SPgroup2, which contains deposition date and p_unstable SPx <- SPgroup2 config <- clusterSPconfig(simType = 'wsum_scaled', ddate = T, pwls = T) ## Hierarchical clustering with k = 2 cl_hclust <- clusterSP(SPx, k = 2, type = 'hclust', config = config) plot(cl_hclust) ## Precompute a distance matrix and cluster with PAM for k = 2 and 3 distmat <- do.call('distanceSP', c(list(SPx), config$args_distance)) cl_pam2 <- clusterSP(SPx, k = 2, type = 'pam', config = config, distmat = distmat) cl_pam3 <- clusterSP(SPx, k = 3, type = 'pam', config = config, distmat = distmat) print(cl_pam2$clustering) print(cl_pam3$clustering) ## kdba clustering config_kdba <- clusterSPconfig(simType = 'layerwise', type = 'kdba') cl_kdba <- clusterSP(SPx = SPgroup2, k = 2, type = 'kdba', config = config_kdba) plot(cl_kdba) }

See Also

clusterSPconfig , clusterSPcenters , clusterSPkdba , plot.clusterSP

Author(s)

fherla shorton