Performs functional principal component analysis of probability densities in order to describe a data folder, consisting of T groups of individuals on which are observed p variables. It returns an object of class fpcad.
If it is an object of class "folder", its elements are data frames with p numeric columns. If there are non numeric columns, there is an error. The tth element (t=1,…,T) matches with the tth group.
If it is a data frame, the column with name given by the group.name argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error.
group.name: string.
If xf is an object of class "folder", name of the grouping variable in the returned results. The default is groupname = "group".
If xf is a data frame, group.name is the name of the column of xf containing the groups.
gaussiand: logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.
windowh: either a list of T bandwidths (one per density associated to a group), or a strictly positive number. If windowh = NULL (default), the bandwidths are automatically computed. See Details.
normed: logical. If TRUE (default), the densities are normed before computing the distances.
centered: logical. If TRUE (default), the densities are centered.
data.centered: logical. If TRUE (default is FALSE), the data of each group are centered.
data.scaled: logical. If TRUE (default is FALSE), the data of each group are centered (even if data.centered = FALSE) and scaled.
common.variance: logical. If TRUE (default is FALSE), a common covariance matrix (or correlation matrix if data.scaled = TRUE), computed on the whole data, is used. If FALSE (default), a covariance (or correlation) matrix per group is used.
nb.factors: numeric. Number of returned principal scores (default nb.factors = 3).
Warning: The plot.fpcad and interpret.fpcad functions cannot take into account more than nb.factors principal factors.
nb.values: numerical. Number of returned eigenvalues (default nb.values = 10).
sub.title: string. If provided, the subtitle for the graphs.
plot.eigen: logical. If TRUE (default), the barplot of the eigenvalues is plotted.
plot.score: logical. If TRUE, the graphs of principal scores are plotted. A new graphic device is opened for each pair of principal scores defined by nscore argument.
nscore: numeric vector. If plot.score = TRUE, the numbers of the principal scores which are plotted. By default it is equal to nscore = 1:3. Its components cannot be greater than nb.factors.
filename: string. Name of the file in which the results are saved. By default (filename = NULL) the results are not saved.
Details
The T probability densities ft corresponding to the T groups of individuals are either parametrically estimated (gaussiand = TRUE) or estimated using the Gaussian kernel method (gaussiand = FALSE). In the latter case, the windowh argument provides the list of the bandwidths to use. Notice that in the multivariate case (p>1) the bandwidths are positive-definite matrices.
If windowh is a numerical value, the matrix bandwidth is of the form hS, where S is either the square root of the covariance matrix (p>1) or the standard deviation of the estimated density.
If windowh = NULL (default), h in the above formula is computed using the bandwidth.parameter function.
Returns
Returns an object of class fpcad, that is a list including: - inertia: data frame of the eigenvalues and percentages of inertia.
contributions: data frame of the contributions to the first nb.factors principal components.
qualities: data frame of the qualities on the first nb.factors principal factors.
scores: data frame of the first nb.factors principal scores.
norm: vector of the L2 norms of the densities.
means: list of the means.
variances: list of the covariance matrices.
correlations: list of the correlation matrices.
skewness: list of the skewness coefficients.
kurtosis: list of the kurtosis coefficients.
References
Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.
Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.
Author(s)
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
data(roses)# Case of a normed non-centred PCA of Gaussian densities (on 3 architectural # characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym))rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")])result3 <- fpcad(rosesf, group.name ="rose")print(result3)plot(result3)# Applied to a data frame:result3df <- fpcad(roses[,c("Sha","Den","Sym","rose")], group.name ="rose")print(result3df)plot(result3df)# Flower colors of the rosesscores <- result3$scores
scores <- data.frame(scores, color = scores$rose, stringsAsFactors =TRUE)colours <- scores$rose
colours <- factor(c(A ="yellow", B ="yellow", C ="pink", D ="yellow", E ="red", F ="yellow", G ="pink", H ="pink", I ="yellow", J ="yellow"))levels(scores$color)<- c(A ="yellow", B ="yellow", C ="pink", D ="yellow", E ="red", F ="yellow", G ="pink", H ="pink", I ="yellow", J ="yellow")# Scores according to the first two principal components, per colorplot(result3, nscore =1:2, color = colours)