integration function

Computes integration and acontamination of the clustering

Computes integration and acontamination of the clustering

Integartion and acontamination are measures of the quality of a clustering with a reference to a true partition. Let X=(x1,xp)X = (x_1, \ldots x_p) be the data set, AA be a partition into clusters A1,AnA_1, \ldots A_n (true partition) and BB be a partition into clusters B1,,BmB_1, \ldots, B_m. Then for cluster AjA_j integration is eqaul to: [REMOVE_ME]Int(Aj)=maxk=1,,m#{i{1,p}:xiAjxiBk}#Aj[REMOVEME2] Int(A_j) =\frac{max_{k = 1, \ldots, m} \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j\wedge x_i \in B_k \} }{\# A_j} [REMOVE_ME_2] The BkB_k for which the value is maximized is called the integrating cluster of AjA_j. Then the integration for the whole clustering equals is c("Int(A,B)=frac1n\nInt(A,B) = \\frac{1}{n} \n", "sumj=1nInt(Aj)\\sum_{j=1}^n Int(A_j)") .The acontamination is defined by: [REMOVE_ME]Acont(Aj)=#{i{1,p}:xiAjxiBk}#Bk[REMOVEME2] Acont(A_j) =\frac{ \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j \wedge x_i \in B_k \} }{\#B_k} [REMOVE_ME_2] where BkB_k is the integrating cluster for AjA_j. Then the acontamination for the whole dataset is c("Acont(A,B)=frac1n\nAcont(A,B) = \\frac{1}{n} \n", "sumj=1nAcont(Aj)\\sum_{j=1}^n Acont(A_j)")

integration(group, true_group)

Arguments

  • group: A vector, first partition.
  • true_group: A vector, second (reference) partition.

Returns

An array containing values of integration and acontamination.

Description

Integartion and acontamination are measures of the quality of a clustering with a reference to a true partition. Let X=(x1,xp)X = (x_1, \ldots x_p) be the data set, AA be a partition into clusters A1,AnA_1, \ldots A_n (true partition) and BB be a partition into clusters B1,,BmB_1, \ldots, B_m. Then for cluster AjA_j integration is eqaul to:

Int(Aj)=maxk=1,,m#{i{1,p}:xiAjxiBk}#Aj Int(A_j) =\frac{max_{k = 1, \ldots, m} \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j\wedge x_i \in B_k \} }{\# A_j}

The BkB_k for which the value is maximized is called the integrating cluster of AjA_j. Then the integration for the whole clustering equals is c("Int(A,B)=frac1n\nInt(A,B) = \\frac{1}{n} \n", "sumj=1nInt(Aj)\\sum_{j=1}^n Int(A_j)") .The acontamination is defined by:

Acont(Aj)=#{i{1,p}:xiAjxiBk}#Bk Acont(A_j) =\frac{ \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j \wedge x_i \in B_k \} }{\#B_k}

where BkB_k is the integrating cluster for AjA_j. Then the acontamination for the whole dataset is c("Acont(A,B)=frac1n\nAcont(A,B) = \\frac{1}{n} \n", "sumj=1nAcont(Aj)\\sum_{j=1}^n Acont(A_j)")

Examples

sim.data <- data.simulation(n = 20, SNR = 1, K = 2, numb.vars = 50, max.dim = 2) true_segmentation <- rep(1:2, each=50) mlcc.fit <- mlcc.reps(sim.data$X, numb.clusters = 2, max.dim = 2, numb.cores=1) integration(mlcc.fit$segmentation, true_segmentation)

References

M. Sołtys. Metody analizy skupień. Master’s thesis, Wrocław University of Technology, 2010

  • Maintainer: Piotr Sobczyk
  • License: GPL-3
  • Last published: 2019-06-26

Useful links