integration() R function from [varclust]

Computes integration and acontamination of the clustering

Integartion and acontamination are measures of the quality of a clustering with a reference to a true partition. Let $X = (x_1, \ldots x_p)$ be the data set, $A$ be a partition into clusters $A_1, \ldots A_n$ (true partition) and $B$ be a partition into clusters $B_1, \ldots, B_m$ . Then for cluster $A_j$ integration is eqaul to: [REMOVE_ME] $Int(A_j) =\frac{max_{k = 1, \ldots, m} \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j\wedge x_i \in B_k \} }{\# A_j} [REMOVE_ME_2]$ The $B_k$ for which the value is maximized is called the integrating cluster of $A_j$ . Then the integration for the whole clustering equals is c(" $Int(A,B) = \\frac{1}{n} \n$ ", " $\\sum_{j=1}^n Int(A_j)$ ") .The acontamination is defined by: [REMOVE_ME] $Acont(A_j) =\frac{ \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j \wedge x_i \in B_k \} }{\#B_k} [REMOVE_ME_2]$ where $B_k$ is the integrating cluster for $A_j$ . Then the acontamination for the whole dataset is c(" $Acont(A,B) = \\frac{1}{n} \n$ ", " $\\sum_{j=1}^n Acont(A_j)$ ")


integration(group, true_group)

Arguments

group: A vector, first partition.
true_group: A vector, second (reference) partition.

Returns

An array containing values of integration and acontamination.

Description

Int(A_j) =\frac{max_{k = 1, \ldots, m} \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j\wedge x_i \in B_k \} }{\# A_j}

The $B_k$ for which the value is maximized is called the integrating cluster of $A_j$ . Then the integration for the whole clustering equals is c(" $Int(A,B) = \\frac{1}{n} \n$ ", " $\\sum_{j=1}^n Int(A_j)$ ") .The acontamination is defined by:

Acont(A_j) =\frac{ \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j \wedge x_i \in B_k \} }{\#B_k}

where $B_k$ is the integrating cluster for $A_j$ . Then the acontamination for the whole dataset is c(" $Acont(A,B) = \\frac{1}{n} \n$ ", " $\\sum_{j=1}^n Acont(A_j)$ ")

Examples


sim.data <- data.simulation(n = 20, SNR = 1, K = 2, numb.vars = 50, max.dim = 2)
true_segmentation <- rep(1:2, each=50)
mlcc.fit <- mlcc.reps(sim.data$X, numb.clusters = 2, max.dim = 2, numb.cores=1)
integration(mlcc.fit$segmentation, true_segmentation)

References

M. Sołtys. Metody analizy skupień. Master’s thesis, Wrocław University of Technology, 2010

varclust package Read PDF manual

Maintainer: Piotr Sobczyk
License: GPL-3
Last published: 2019-06-26

Useful links

integration function