BinaryProximities function

Proximity Measures for Binary Data

Proximity Measures for Binary Data

Calculation of proxymities among rows or columns of a binary data matrix or a data frame that will be converted into a binary data matrix.

BinaryProximities(x, y = NULL, coefficient = "Jaccard", transformation = NULL, transpose = FALSE, ...)

Arguments

  • x: A data frame or a binary data matrix. Proximities among the rows of x will be calculated
  • y: Supplementary data. The proximities amond the rows of x and the rows of y will be also calculated
  • coefficient: Similarity coefficient. Use the number or the name (see details)
  • transformation: Transformation of the similarities. Use the number or the name (see details)
  • transpose: Logical. If TRUE, proximities among columns are calculated
  • ...: Used to provide additional parameters for the conversion of the dataframe into a binary matrix

Details

A binary data matrix is a matrix with values 0 or 1 coding the absence or presence of several binary characters. When a data frame is provided, every variable in the data frame is converted to a binary variable using the function Dataframe2BinaryMatrix. Factors with two levels are converted directly to binary variables, factors with more than two levels are converted to a matrix with as meny columns as levels and numerical variables are converted to binary variables using a cut point that can be the median, the mean or a value provided by the user.

The following coefficients are calculated

1.- Kulezynski = a/(b + c)

2.- Russell_and_Rao = a/(a + b + c+d)

3.- Jaccard = a/(a + b + c)

4.- Simple_Matching = (a + d)/(a + b + c + d)

5.- Anderberg = a/(a + 2 * (b + c))

6.- Rogers_and_Tanimoto = (a + d)/(a + 2 * (b + c) + d)

7.- Sorensen_Dice_and_Czekanowski = a/(a + 0.5 * (b + c))

8.- Sneath_and_Sokal = (a + d)/(a + 0.5 * (b + c) + d)

9.- Hamman = (a - (b + c) + d)/(a + b + c + d)

10.- Kulezynski = 0.5 * ((a/(a + b)) + (a/(a + c)))

11.- Anderberg2 = 0.25 * (a/(a + b) + a/(a + c) + d/(c + d) + d/(b + d))

12.- Ochiai = a/sqrt((a + b) * (a + c))

13.- S13 = (a * d)/sqrt((a + b) * (a + c) * (d + b) * (d + c))

14.- Pearson_phi = (a * d - b * c)/sqrt((a + b) * (a + c) * (d + b) * (d + c))

15.- Yule = (a * d - b * c)/(a * d + b * c)

The following transformations of the similarity3 are calculated

1.- Identity dis=sim

2.- 1-S dis=1-sim

3.- sqrt(1-S) dis = sqrt(1 - sim)

4.- -log(s) dis=-1*log(sim)

5.- 1/S-1 dis=1/sim -1

6.- sqrt(2(1-S)) dis== sqrt(2*(1 - sim))

7.- 1-(S+1)/2 dis=1-(sim+1)/2

8.- 1-abs(S) dis=1-abs(sim)

9.- 1/(S+1) dis=1/(sim)+1

Note that, after transformation the similarities are converted to distances except for "Identity". Not all the transformations are suitable for all the coefficients. Use them at your own risk. The default values are admissible combinations.

Returns

An object of class proximities.This has components: - TypeData: Binary, Continuous or Mixed. Binary in this case.

  • Coefficient: Coefficient used to calculate the proximities

  • Transformation: Transformation used to calculate the proximities

  • Data: Data used to calculate the proximities

  • SupData: Supplementary Data, if any

  • Proximities: Proximities among rows of x. May be similarities or dissimilarities depending on the transformation

  • SupProximities: Proximities among rows of x and y.

References

Gower, J. C. (2006) Similarity dissimilarity and Distance, measures of. Encyclopedia of Statistical Sciences. 2nd. ed. Volume 12. Wiley

Author(s)

Jose Luis Vicente-Villardon

See Also

BinaryDistances, Dataframe2BinaryMatrix

Examples

data(spiders) D=BinaryProximities(spiders, coefficient="Jaccard", transformation="sqrt(1-S)") D2=BinaryProximities(spiders, coefficient=3, transformation=3)
  • Maintainer: Jose Luis Vicente Villardon
  • License: GPL (>= 2)
  • Last published: 2023-11-21

Useful links