fclustering() R function from [GET]

Functional clustering

Functional clustering based on a specified measure. The options of the measures can be found in central_region.


fclustering(
  curve_sets,
  k,
  type = c("area", "st", "erl", "cont"),
  triangineq = FALSE,
  ...
)

Arguments

curve_sets: A curve_set object or a list of curve_set objects to which the functional clustering is to be applied. If list of curve_set objects is provided, then the joined functional clustering is applied, which provides an equal weight combination of curve_set objects, if the curve_set objects contain the same numbers of elements (same lengths of vector $r$ ).
k: The number of clusters.
type: The measure which is used to compute the dissimilarity matrix. The preferred options are "area" and "st", but "erl" and "cont" can be also used with caution.
triangineq: Logical. Whether or not to compute the proportion of combinations of functions which satisfies the triangular inequality, see 'Value'.
...: Additional parameters to be passed to central_region, which is responsible for calculating the central region (global envelope) on which the functional clustering is based.

Returns

An object having the class fclust, containing

curve_sets = The set(s) of functions determined for clustering
k = Number of clusters
type = Type of clustering method
triangineq = The proportion of combinations of functions which satisfies the triangular inequality. The triangular inequality must hold to ensure the chosen measure forms a metric. In some weird cases it does not hold for ‘area’ measure, therefore this check is provided to ensure the data forms metric with the ‘area’ measure. The triangineq must be 1 to ensure the inequality holds for all functions.
dis = The joined dissimilarity matrix
pam = Results of the partitioning around medoids (pam) method applied on the joined functions with the dissimilarity matrix (dis). See pam.

Details

Functional clustering joins the list of curve_set objects in one curve_set with long functions and applies on the differences of all functions the specified measure. This provides a dissimilarity matrix which is used in partitioning around medoids procedure. The resulting clusters can then be shown by plotting the function respectively for each curve_set. Thus for each curve_set, the panel with all the medoids is shown followed by all clusters represented by central region, medoid and all curves belonging to it, when the result object is plotted.

If there are less than three curves in some of the groups, then the central region is not plotted. This leads to a warning message from ggplot2.

Examples


# Read raw data from population growth rdata
# with countries over million inhabitants
data("popgrowthmillion")

# Create centred data
m <- apply(popgrowthmillion, 2, mean) # Country-wise means
cpopgrowthmillion <- popgrowthmillion
for(i in 1:dim(popgrowthmillion)[1]) {
  cpopgrowthmillion[i,] <- popgrowthmillion[i,] - m
}

# Create scaled data
t2 <- function(v) { sqrt(sum(v^2)) }
s <- apply(cpopgrowthmillion, 2, t2)
spopgrowthmillion <- popgrowthmillion
for(i in 1:dim(popgrowthmillion)[1]) {
  spopgrowthmillion[i,] <- cpopgrowthmillion[i,]/s
}

# Create curve sets
r <- 1951:2015

cset1 <- curve_set(r = r, obs = popgrowthmillion)
cset2 <- curve_set(r = r, obs = spopgrowthmillion)
csets <- list(Raw = cset1, Shape = cset2)

 with respect to joined "st" difference measure
# and "joined" central regions of each group
res <- fclustering(csets, k=3, type="area")
p <- plot(res, plotstyle = "marginal", coverage = 0.5)
p[[1]] # Central functions
p[[2]] # Groups: central functions and regions
# To collect the two figures into one use, e.g., patchwork:
if(require("patchwork", quietly=TRUE)) {
  p[[1]] + p[[2]] + plot_layout(widths = c(1, res$k))
}
# Silhouette plot of pam
plot(res$pam)

References

Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. DOI: 10.5772/intechopen.100124

fclustering function