MeilaVariationOfInformation function

Use variation of clustering information to compare pairs of splits

Use variation of clustering information to compare pairs of splits

Compare a pair of splits viewed as clusterings of taxa, using the variation of clustering information proposed by \insertCite Meila2007TreeDist. UTF-8

MeilaVariationOfInformation(split1, split2) MeilaMutualInformation(split1, split2)

Arguments

  • split1, split2: Logical vectors listing leaves in a consistent order, identifying each leaf as a member of the ingroup (TRUE) or outgroup (FALSE) of the split in question.

Returns

MeilaVariationOfInformation() returns the variation of (clustering) information, measured in bits.

MeilaMutualInformation() returns the mutual information, measured in bits.

Details

This is equivalent to the mutual clustering information \insertCite Vinh2010TreeDist. For the total information content, multiply the VoI by the number of leaves.

Examples

# Maximum variation = information content of each split separately A <- TRUE B <- FALSE MeilaVariationOfInformation(c(A, A, A, B, B, B), c(A, A, A, A, A, A)) Entropy(c(3, 3) / 6) + Entropy(c(0, 6) / 6) # Minimum variation = 0 MeilaVariationOfInformation(c(A, A, A, B, B, B), c(A, A, A, B, B, B)) # Not always possible for two evenly-sized splits to reach maximum # variation of information Entropy(c(3, 3) / 6) * 2 # = 2 MeilaVariationOfInformation(c(A, A, A,B ,B, B), c(A, B, A, B, A, B)) # < 2 # Phylogenetically uninformative groupings contain spliting information Entropy(c(1, 5) / 6) MeilaVariationOfInformation(c(B, A, A, A, A, A), c(A, A, A, A, A, B))

References

\insertAllCited

Author(s)

Martin R. Smith

(martin.smith@durham.ac.uk)