NominalDistances() R function from [MultBiplotR]

Distances among individuals with nominal variables

This function computes several measures of distance (or similarity) among individuals from a nominal data matrix. UTF-8


NominalDistances(X, method = 1, diag = FALSE, upper = FALSE, similarity = TRUE)

Arguments

X: Matrix or data.frame with the nominal variables.
method: An integer between 1 and 6. See details
diag: A logical value indicating whether the diagonal of the distance matrix should be printed.
upper: a logical value indicating whether the upper triangle of the distance matrix should be printed.
similarity: A logical value indicating whether the similarity matrix should be computed.

Details

Let be the table of nominal data. All these distances are of type $d = sqrt(1 - s)$ with s a similarity coefficient.

1 = Overlap method: The overlap measure simply counts the number of attributes that match in the two data instances.
2 = Eskin: Eskin et al. proposed a normalization kernel for record-based network intrusion detection data. The original measure is distance-based and assigns a weight of $\frac{2}{n_{k}^{2}}$ for mismatches; when adapted to similarity, this becomes a weight of $\frac{n_{k}^{2}}{n_{k}^{2}+2}$ .This measure gives more weight to mismatches that occur on attributes that take many values.
3=IOF (Inverse Occurrence Frequency .): This measure assigns lower similarity to mismatches on more frequent values. The IOF measure is related to the concept of inverse document frequency which comes from information retrieval, where it is used to signify the relative number of documents that contain a spe- cific word.
4 = OF (Ocurrence Frequency): This measure gives the opposite weighting of the IOF measure for mismatches, i.e., mismatches on less frequent values are assigned lower similarity and mismatches on more frequent values are assigned higher similarity
5 = Goodall3: This measure assigns a high similarity if the matching values are infrequent regardless of the frequencies of the other values.
6 = Lin: This measure gives higher weight to matches on frequent values, and lower weight to mismatches on infrequent values.

Returns

An object of class distance

References

Boriah, S., Chandola, V. & Kumar,V.(2008). Similarity measures for categorical data: A comparative evaluation. In proceedings of the eight SIAM International Conference on Data Mining, pp 243--254.

Author(s)

Jose L. Vicente-Villardon

Examples


## Not run:

data(Env)
Distance<-NominalDistances(Env,upper=TRUE,diag=TRUE,similarity=FALSE,method=1)
## End(Not run)

MultBiplotR package Read PDF manual

Maintainer: Jose Luis Vicente Villardon
License: GPL (>= 2)
Last published: 2023-11-21

Useful links

NominalDistances function