Computes integration and acontamination of the clustering
Computes integration and acontamination of the clustering
Integartion and acontamination are measures of the quality of a clustering with a reference to a true partition. Let X=(x1,…xp) be the data set, A be a partition into clusters A1,…An (true partition) and B be a partition into clusters B1,…,Bm. Then for cluster Aj integration is eqaul to: [REMOVE_ME]Int(Aj)=#Ajmaxk=1,…,m#{i∈{1,…p}:xi∈Aj∧xi∈Bk}[REMOVEME2] The Bk for which the value is maximized is called the integrating cluster of Aj. Then the integration for the whole clustering equals is c("Int(A,B)=frac1n\n", "sumj=1nInt(Aj)") .The acontamination is defined by: [REMOVE_ME]Acont(Aj)=#Bk#{i∈{1,…p}:xi∈Aj∧xi∈Bk}[REMOVEME2] where Bk is the integrating cluster for Aj. Then the acontamination for the whole dataset is c("Acont(A,B)=frac1n\n", "sumj=1nAcont(Aj)")
integration(group, true_group)
Arguments
group: A vector, first partition.
true_group: A vector, second (reference) partition.
Returns
An array containing values of integration and acontamination.
Description
Integartion and acontamination are measures of the quality of a clustering with a reference to a true partition. Let X=(x1,…xp) be the data set, A be a partition into clusters A1,…An (true partition) and B be a partition into clusters B1,…,Bm. Then for cluster Aj integration is eqaul to:
The Bk for which the value is maximized is called the integrating cluster of Aj. Then the integration for the whole clustering equals is c("Int(A,B)=frac1n\n", "sumj=1nInt(Aj)") .The acontamination is defined by:
Acont(Aj)=#Bk#{i∈{1,…p}:xi∈Aj∧xi∈Bk}
where Bk is the integrating cluster for Aj. Then the acontamination for the whole dataset is c("Acont(A,B)=frac1n\n", "sumj=1nAcont(Aj)")