distanceSP function

Compute pairwise distances between snow profiles

Compute pairwise distances between snow profiles

Calculate the distance between all combinations of snowprofiles in a snowprofileSet by:

distanceSP( SPx, SP2 = NULL, output = "dist", n_cores = NULL, symmetric = TRUE, fast_summary = FALSE, fast_summary_weights = clusterSPconfig()$args_fast, progressbar = requireNamespace("progress", quietly = TRUE), ... )

Arguments

  • SPx: a sarp.snowprofile::snowprofileSet object (or a single snowprofile if SP2 is provided)
  • SP2: a sarp.snowprofile::snowprofile object if SPx is also a snowprofile and a single pairwise distance is to be computed
  • output: type of output to return, either a class dist (default) or matrix
  • n_cores: number of nodes to create for a cluster using the parallel package to do distance matrix calculation in parallel (default is serial calculations)
  • symmetric: only compute one of two alignments dtwSP(A, B) or dtwSP(B, A) rather than taking the min distance (when diminished accuracy is favourable to speed up run times for large number of profiles)
  • fast_summary: Option to compute distances from basic summary stats instead of layerwise comparisons
  • fast_summary_weights: A named numeric vector with relative weights for each snowpack property. Must be in exact order, but do not need to be normalized. Use clusterSPconfig $args_fast for template. See details for summary stats that have been implemented.
  • progressbar: Do you want to print a progress bar with recommended package 'progress'? (only works for n_core = NULL)
  • ...: arguments passed to dtwSP and further to simSP . simType from simSP is an important choice.

Returns

Either a dist or matrix object with pairwise distances (depending on output argument)

Details

  1. Matching their layers and aligning them (i.e., warp one profile onto the other one)
  2. Assessing the similarity of the aligned profiles based on avalanche hazard relevant characteristics
  3. Convert the similarity score into a distance value between [0, 1]

This procedure is useful for clustering and aggregating tasks, given a set of multiple profiles.

When computing the distance matrix this routine calls simSP for every possible pair of profiles among the group. During that call the profile pair is aligned by dtwSP

and the aligned pair is evaluated by simSP .

Note that the pairwise distance matrix is modified within the function call to represent a symmetric distance matrix. That is, however, not originally the case, since dtwSP(A, B) != dtwSP(B, A). The matrix is therefore made symmetric by setting the similarity between the profiles A and B to min({dtwSP(A, B), dtwSP(B, A)}).

Note that the number of possible profile pairs grows exponentially with the number of profiles in the group (i.e., O(n^2) calls, where n is the number of profiles in the group). Several option for improved performance include:

  • Using the n_core argument to activate thee parallel package. A suggestion value is the number of cores on your system minus one n_cores = parallel::detectCores() - 1.
  • Setting symmetric = FALSE will only calculate dtwSP(A, B) and therefore not make the matrix symmetric, but cut the number of alignments in half
  • Setting fast_summary = TRUE will compute similarities from basic summary stats instead of aligning layers with dynamic time warping.

When using fast_summary = TRUE, you can provide custom weights to change the relative importance of the following snowpack properties:

  • w_hs: total snow height
  • w_hn24: height of snow in past 24 h
  • w_hn72: height of snow in past 72 h
  • w_slab: average hand hardness of snow in past 72 h
  • w_gtype: total thickness of layers grouped into new snow (PP, DF), pwls (SH, FC, DH), bulk (RG, FCxr) and melt (MF, MFcr, IF)
  • w_gtype_rel: w_gtype scaled by HS
  • w_new: total thickness of PP/DF layers
  • w_pwl: do critical weak layers exist in the top/middle/bottom thirds of the profile
  • w_crust: do melt-freeze crusts exist in the top/middle/bottom thirds of the profile
  • w_rta: maximum rta in the top/middle/bottom thirds of the profile The number of stats computed depends on the snowprofileLayer properties available in the data.

@examples

Simple serial calculation

distmat1 <- distanceSP(SPgroup2[1:4])

Parallel calculation (uncomment)

#distmat2 <- distanceSP(SPgroup2[1:4], n_cores = parallel::detectCores() - 1)

Fast summary method

distmat3 <- distanceSP(SPgroup2, fast_summary = T)

View the default weights, then recalculate the distances with adjusted weights

print(clusterSPconfig()$args_fast) weights <- c(w_hs = 3, w_hn24 = 0, w_h3d = 2, w_slab = 0, w_gtype = 0, w_gtype_rel = 0, w_new = 0, w_pwl = 0, w_crust = 1, w_rta = 1) distmat4 <- distanceSP(SPgroup2, fast_summary = T, fast_summary_weights = weights)

See Also

simSP , medoidSP , clusterSP

Author(s)

shorton fherla