This function is the main gateway to sarp.snowprofile::snowprofile clustering.
clusterSP( SPx =NULL, k =2, type = c("hclust","pam","fanny","kdba","fast")[1], distmat =NULL, config = clusterSPconfig(type), centers ="none", keepSPx =TRUE, keepDistmat =TRUE)
Arguments
SPx: a sarp.snowprofile::snowprofileSet to be clustered
k: number of desired cluster numbers
type: clustering type including hclust (default), pam, fanny, kdba and fast
distmat: a precomputed distance matrix of class dist. This results in much faster clustering for type %in% c('hclust', 'pam', 'fanny')
as well as faster identification of medoid profiles if centers %in% c('medoids', 'both')
config: a list providing the necessary hyperparameters. Use clusterSPconfig functions for convenience!
centers: compute and return mediods, centroids, both, or none for each cluster. default 'none' will only return centroids/medoids if they were already calculated with the clustering algorithm, whereas other options could result in extra processing time to calculate additional centroids/medoids
keepSPx: append the snowprofileSet to the output?
keepDistmat: append the distmat to the output?
Returns
a list of class clusterSP containing:
clustering: vector of integers (from 1:k) indicating the cluster to which each point is allocated
id.med: vector of indices for the medoid profiles of each cluster (if calculated)
centroids: snowprofileSet containing the centroid profile for each cluster (if calculated)
tree: object of class 'hclust' describing the tree output by hclust
...: all other outputs provided by the clustering algorithms (e.g., a membership matrix from fanny.object, pam.object, iteration history from clusterSPkdba )
type: type of clustering as provided by input argument
call: a copy of the clusterSP function call
SPx: a copy of the input snowprofileSet (if keepSPx = TRUE)
distmat: the pairwise distance matrix of class dist (if keepDistmat = TRUE and a matrix has been provided or computed)
Details
There are several clustering approaches that can be applied to snow profiles. Most rely on computing a pairwise distance matrix between all profiles in a snowprofileSet. Current implementations with this approach rely on existing R functions:
Since computing a pairwise distance matrix matrix can be slow, the recommended way of testing different number of clusters k is precomputing a single distance matrix with the distanceSP function and providing it as an argument to clusterSP.
An alternate type of clustering known a k-dimensional barycentric averaging kdba is conceptually similar to kmeans but specifically adapted to snow profiles clusterSPkdba . That means that an initial clustering condition (which can be random or based on a 'sophisticated guess') is iteratively refined by assigning individual profiles to the most similar cluster and at the end of every iteration recomputing the cluster centroids. The cluster centroids are represented by the average snow profile of each cluster (see averageSP ). Note that the results of kdba are sensitive to the initial conditions, which by default are estimated with the 'fast' method below.
And finally, a much faster 'fast' method is available that computes a pairwise distance matrix without aligning profiles, but instead based on summary statistics such as snow height, height of new snow, presence or absence of weak layers and crusts, etc. The 'fast' clustering approach uses the partitioning around medoids clustering approach with the 'fast' distance matrix.
More details here...
Examples
this_example_runs_too_long <-TRUEif(!this_example_runs_too_long){# exclude from cran checks## Cluster with SPgroup2, which contains deposition date and p_unstable SPx <- SPgroup2
config <- clusterSPconfig(simType ='wsum_scaled', ddate = T, pwls = T)## Hierarchical clustering with k = 2 cl_hclust <- clusterSP(SPx, k =2, type ='hclust', config = config) plot(cl_hclust)## Precompute a distance matrix and cluster with PAM for k = 2 and 3 distmat <- do.call('distanceSP', c(list(SPx), config$args_distance)) cl_pam2 <- clusterSP(SPx, k =2, type ='pam', config = config, distmat = distmat) cl_pam3 <- clusterSP(SPx, k =3, type ='pam', config = config, distmat = distmat) print(cl_pam2$clustering) print(cl_pam3$clustering)## kdba clustering config_kdba <- clusterSPconfig(simType ='layerwise', type ='kdba') cl_kdba <- clusterSP(SPx = SPgroup2, k =2, type ='kdba', config = config_kdba) plot(cl_kdba)}