Misclassification ratio in functional discriminant analysis of discrete probability distributions.
Misclassification ratio in functional discriminant analysis of discrete probability distributions.
Computes the one-leave-out misclassification ratio of the rule assigning T groups of individuals, one group after another, to the class of groups (among K classes of groups) which achieves the minimum of the distances or divergences between the probability distribution associated to the group to assign and the K probability distributions associated to the K classes.
xf: object of class folderh with two data frames or list of arrays (or tables).
If it is a folderh:
The first data.frame has at least two columns. One column contains the names of the T groups (all the names must be different). An other column is a factor with K levels partitionning the T groups into K classes.
The second one has (q+1) columns. The first q columns are factors (otherwise, they are coerced into factors). The last column is a factor with T levels defining T groups. Each group, say t, consists of nt individuals.
If it is a list of arrays or tables, the tth element (t=1,…,T) is the table of the joint distribution (absolute or relative frequencies) of the tth group. These arrays have the same shape:
Each array (or table) xf[[i]] has:
the same dimension(s). If q=1 (univariate), dim(xf[[i]]) is an integer. If q>1 (multivariate), dim(xf[[i]]) is an integer vector of length q.
the same dimension names dimnames(xf[[i]]) (is non NULL). These dimnames are the names of the variables.
class.var: string (if xf is an object of class "folderh") or data.frame with two columns (if xf is a list of arrays).
If xf is of class "folder", class.var is the name of the class variable.
If xf is a list of arrays or a list of tables, class.var is a data.frame with at least two columns named "group" and "class". The "group" column contains the names of the T groups (all the names must be different). The "class" column is a factor with K levels partitioning the T groups into K classes.
distance: The distance or dissimilarity used to compute the distance matrix between the densities. It can be:
"l1" (default) the Lp distance with p=1
"l2" the Lp distance with p=2
"chisqsym" the symmetric Chi-squared distance
"hellinger" the Hellinger metric (Matusita distance)
"lp" the Lp distance with p given by the argument p of the function.
crit: 1 or 2. In order to select the densities associated to the classes. See Details.
p: integer. Optional. When distance = "lp" (Lp distance with p>2), p is the parameter of the distance.
Details
If xf is an object of class "folderh" containing the data:
The T probability distributions ft corresponding to the T groups of individuals are estimated by frequency distributions within each group.
To the class k consisting of Tk groups is associated the probability distribution gk, knowing that when using the one-leave-out method, we do not include the group to assign in its class k. The crit argument selects the estimation method of the gk's.
crit=1
The probability distribution gk is estimated using the whole data of this class, that is the rows of x corresponding to the Tk groups of the class k.
The estimation of the gk's uses the same method as the estimation of the ft's.
crit=2
The Tk probability distributions ft are estimated using the corresponding data from xf. Then they are averaged to obtain an estimation of the density gk, that is gk=(1/Tk)∑ft.
If xf is a list of arrays (or list of tables):
The tth array is the joint frequency distribution of the tth group. The frequencies can be absolute or relative.
To the class k consisting of Tk groups is associated the probability distribution gk, knowing that when using the one-leave-out method, we do not include the group to assign in its class k. The crit argument selects the estimation method of the gk's.
crit=1
gk=(1/∑nt)∑ntft, where nt is the total of xf[[t]].
Notice that when xf[[t]] contains relative frequencies, its total is 1. That is equivalent to crit=2.
crit=2
gk=(1/Tk)∑ft.
Returns
Returns an object of class discdd.misclass, that is a list including: - classification: data frame with 4 columns:
* factor giving the group name. The column name is the same as that of the column ($q+1$) of `x`,
* the prior class of the group if it is available, or NA if not,
* `alloc`: the class allocation computed by the discriminant analysis method,
* `misclassed`: boolean. `TRUE` if the group is misclassed, `FALSE` if it is well-classed, `NA` if the prior class of the group is unknown.
confusion.mat: confusion matrix,
misalloc.per.class: the misclassification ratio per class,
misclassed: the misclassification ratio,
distances: matrix with T rows and K columns, of the distances (dtk): dtk is the distance between the group t and the class k,
proximities: matrix of the proximity indices (in percents) between the groups and the classes. The proximity between the group t and the class k is: (1/dtk)/∑l=1l=K(1/dtl).
References
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens, Centre de Recherches Archéologiques médiévales de Saverne, 5, 5-38.
Author(s)
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
Examples
# Example 1 with a folderh obtained by converting numeric variablesdata("castles.dated")stones <- castles.dated$stones
periods <- castles.dated$periods
stones$height <- cut(stones$height, breaks = c(19,27,40,71), include.lowest =TRUE)stones$width <- cut(stones$width, breaks = c(24,45,62,144), include.lowest =TRUE)stones$edging <- cut(stones$edging, breaks = c(0,3,4,8), include.lowest =TRUE)stones$boss <- cut(stones$boss, breaks = c(0,6,9,20), include.lowest =TRUE)castlefh <- folderh(periods,"castle", stones)# Default: dist="l1", crit=1discdd.misclass(castlefh,"period")# Hellinger distance, crit=2discdd.misclass(castlefh,"period", distance ="hellinger", crit =2)# Example 2 with a list of 96 arraysdata("dspgd2015")data("departments")classes <- departments[, c("coded","namer")]names(classes)<- c("group","class")# Default: dist="l1", crit=1discdd.misclass(dspgd2015, classes)# Hellinger distance, crit=2discdd.misclass(dspgd2015, classes, distance ="hellinger", crit =2)