fpcad() R function from [dad]

Functional PCA of probability densities

Performs functional principal component analysis of probability densities in order to describe a data folder, consisting of $T$ groups of individuals on which are observed $p$ variables. It returns an object of class fpcad.


fpcad(xf, group.name = "group", gaussiand = TRUE, windowh = NULL, normed = TRUE,
    centered = TRUE, data.centered = FALSE, data.scaled = FALSE,
    common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "",
    plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3,
    filename = NULL)

Arguments

xf: object of class "folder" or data.frame.
- If it is an object of class "folder", its elements are data frames with $p$ numeric columns. If there are non numeric columns, there is an error. The $t^{th}$ element ( $t = 1, \ldots, T$ ) matches with the $t^{th}$ group.
- If it is a data frame, the column with name given by the group.name argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error.
group.name: string.
- If xf is an object of class "folder", name of the grouping variable in the returned results. The default is groupname = "group".
- If xf is a data frame, group.name is the name of the column of xf containing the groups.
gaussiand: logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.
windowh: either a list of $T$ bandwidths (one per density associated to a group), or a strictly positive number. If windowh = NULL (default), the bandwidths are automatically computed. See Details.
normed: logical. If TRUE (default), the densities are normed before computing the distances.
centered: logical. If TRUE (default), the densities are centered.
data.centered: logical. If TRUE (default is FALSE), the data of each group are centered.
data.scaled: logical. If TRUE (default is FALSE), the data of each group are centered (even if data.centered = FALSE) and scaled.
common.variance: logical. If TRUE (default is FALSE), a common covariance matrix (or correlation matrix if data.scaled = TRUE), computed on the whole data, is used. If FALSE (default), a covariance (or correlation) matrix per group is used.
nb.factors: numeric. Number of returned principal scores (default nb.factors = 3).

Warning: The plot.fpcad and interpret.fpcad functions cannot take into account more than nb.factors principal factors.
nb.values: numerical. Number of returned eigenvalues (default nb.values = 10).
sub.title: string. If provided, the subtitle for the graphs.
plot.eigen: logical. If TRUE (default), the barplot of the eigenvalues is plotted.
plot.score: logical. If TRUE, the graphs of principal scores are plotted. A new graphic device is opened for each pair of principal scores defined by nscore argument.
nscore: numeric vector. If plot.score = TRUE, the numbers of the principal scores which are plotted. By default it is equal to nscore = 1:3. Its components cannot be greater than nb.factors.
filename: string. Name of the file in which the results are saved. By default (filename = NULL) the results are not saved.

Details

The $T$ probability densities $f_t$ corresponding to the $T$ groups of individuals are either parametrically estimated (gaussiand = TRUE) or estimated using the Gaussian kernel method (gaussiand = FALSE). In the latter case, the windowh argument provides the list of the bandwidths to use. Notice that in the multivariate case ( $p$ >1) the bandwidths are positive-definite matrices.

If windowh is a numerical value, the matrix bandwidth is of the form $h S$ , where $S$ is either the square root of the covariance matrix ( $p$ >1) or the standard deviation of the estimated density.

If windowh = NULL (default), $h$ in the above formula is computed using the bandwidth.parameter function.

Returns

Returns an object of class fpcad, that is a list including: - inertia: data frame of the eigenvalues and percentages of inertia.

contributions: data frame of the contributions to the first nb.factors principal components.
qualities: data frame of the qualities on the first nb.factors principal factors.
scores: data frame of the first nb.factors principal scores.
norm: vector of the $L^2$ norms of the densities.
means: list of the means.
variances: list of the covariance matrices.
correlations: list of the correlation matrices.
skewness: list of the skewness coefficients.
kurtosis: list of the kurtosis coefficients.

References

Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.

Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

Examples


data(roses)
# Case of a normed non-centred PCA of Gaussian densities (on 3 architectural 
# characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym))
rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")])
result3 <- fpcad(rosesf, group.name = "rose")
print(result3)
plot(result3)

# Applied to a data frame:
result3df <- fpcad(roses[,c("Sha","Den","Sym","rose")], group.name = "rose")
print(result3df)
plot(result3df)

# Flower colors of the roses
scores <- result3$scores
scores <- data.frame(scores, color = scores$rose, stringsAsFactors = TRUE)
colours <- scores$rose
colours <- factor(c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red",
                  F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow"))
levels(scores$color) <- c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red",
                         F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow")
# Scores according to the first two principal components, per color
plot(result3, nscore = 1:2, color = colours)

dad package Read PDF manual

Maintainer: Pierre Santagostini
License: GPL (>= 2)
Last published: 2024-11-22

Useful links

fpcad function