fdiscd.predict() R function from [dad]

Predicting the class of a group of individuals with discriminant analysis of probability densities.

Assigns several groups of individuals, one group after another, to the class of groups (among $K$ classes of groups) which achieves the minimum of the distances or divergences between the density function associated to the group to assign and the $K$ density functions associated to the $K$ classes.


fdiscd.predict(xf, class.var, gaussiand = TRUE,
           distance =  c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"),
           crit = 1, windowh = NULL, misclass.ratio = FALSE)

Arguments

xf: object of class folderh with two data frames:
- The first one has at least two columns. One column contains the names of the $T$ groups (all the names must be different). An other column is a factor with $K$ levels partitionning the T groups into K classes..
- The second one has $(p+1)$ columns. The first $p$ columns are numeric (otherwise, there is an error). The last column is a factor with $T$ levels defining $T$ groups. Each group, say $t$ , consists of $n_t$ individuals.
Notice that for the versions earlier than 2.0, fdiscd.predict applied to two data frames.
class.var: string. The name of the class variable.
distance: The distance or divergence used to compute the distance matrix between the densities. It can be:
- "jeffreys" (default) Jeffreys measure (symmetrised Kullback-Leibler divergence),
- "hellinger" the Hellinger (Matusita) distance,
- "wasserstein" the Wasserstein distance,
- "l2" the $L^2$ distance,
- "l2norm" the densities are normed and the $L^2$ distance between these normed densities is used;
If gaussiand = FALSE, the densities are estimated by the Gaussian kernel method and the distance is "l2" or "l2norm".
crit: 1, 2 or 3. In order to select the densities associated to the classes. See Details.

If distance is "hellinger", "jeffreys" or "wasserstein", crit is necessarily 1 (see Details).
gaussiand: logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.

If distance is "hellinger", "jeffreys" or "wasserstein", gaussiand is necessarily TRUE.
windowh: strictly positive number. If windowh = NULL (default), the bandwidths are computed using the bandwidth.parameter function.

Omitted when distance is "hellinger", "jeffreys" or "wasserstein" (see Details).
misclass.ratio: logical (default FALSE). If TRUE, the confusion matrix and misclassification ratio are computed on the groups whose prior class is known. In order to compute the misclassification ratio by the one-leave-out method, use the fdiscd.misclass function.

Details

To the group $t$ is associated the density denoted $f_t$ . To the class $k$ consisting of $T_k$ groups is associated the density denoted $g_k$ . The crit argument selects the estimation method of the $K$ densities $g_k$ .

The density $g_k$ is estimated using the whole data of this class, that is the rows of x corresponding to the $T_k$ groups of the class $k$ .
The $T_k$ densities $f_t$ are estimated using the corresponding data from x. Then they are averaged to obtain an estimation of the density $g_k$ , that is $g_k = (1/T_k)\sum{f_t}$ .
Each previous density $f_t$ is weighted by $n_t$ (the number of rows of $x$ corresponding to $f_t$ ). Then they are averaged, that is $g_k = (1/\sum n_t) \sum n_t f_t$ .

The last two methods are available only for the $L^2$ -distance. If the divergences between densities are computed using the Hellinger or Wasserstein distance or Jeffreys measure, only the first of these methods is available.

Returns

Returns an object of class fdiscd.predict, that is a list including: - prediction: data frame with 3 columns:

 * factor giving the group name. The column name is the same as that of the column ($p+1$) of `x`,
 * `class.known`: the prior class of the group if it is available, or NA if not,
 * `class.predict`: the class allocation predicted by the discriminant analysis method. If `misclass.ratio = TRUE`, the class allocations are computed for all groups. Otherwise (default), they are computed only for the groups whose class is unknown.

distances: matrix with $T$ rows and $K$ columns, of the distances ( $d_{tk}$ ): $d_{tk}$ is the distance between the group $t$ and the class $k$ , computed with the measure given by argument distance ( $L^2$ -distance, Hellinger distance or jeffreys measure),
proximities: matrix of the proximities (in percents). The proximity of a group $t$ to the class $k$ is computed as so: $(1/d_{tk})/\sum_{l=1}^{l=K}(1/d_{tl})$ .
confusion.mat: the confusion matrix (if misclass.ratio = TRUE)
misclassed: the misclassification ratio (if misclass.ratio = TRUE)

References

Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an $L^2$ approach. Computational Statistics & Data Analysis, 47, 823-843.

Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens. Centre de Recherches Archéologiques Médiévales de Saverne, 5, 5-38.

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

Examples


data(castles.dated)
data(castles.nondated)
castles.stones <- rbind(castles.dated$stones, castles.nondated$stones)
castles.periods <- rbind(castles.dated$periods, castles.nondated$periods)
castlesfh <- folderh(castles.periods, "castle", castles.stones)

# With the L^2-distance

# - crit=1
resultl2.1 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=1)
print(resultl2.1)

# - crit=2
## Not run:

resultl2.2 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=2)
print(resultl2.2)
## End(Not run)

# - crit=3
resultl2.3 <- fdiscd.predict(castlesfh, "period", distance="l2", crit=3)
print(resultl2.3)

# With the Hellinger distance
resulthelling <- fdiscd.predict(castlesfh, "period", distance="hellinger")
print(resulthelling)

# With jeffreys measure
resultjeff <- fdiscd.predict(castlesfh, "period", distance="jeffreys")
print(resultjeff)

dad package Read PDF manual

Maintainer: Pierre Santagostini
License: GPL (>= 2)
Last published: 2024-11-22

Useful links

fdiscd.predict function