fpcad function

Functional PCA of probability densities

Functional PCA of probability densities

Performs functional principal component analysis of probability densities in order to describe a data folder, consisting of TT groups of individuals on which are observed pp variables. It returns an object of class fpcad.

fpcad(xf, group.name = "group", gaussiand = TRUE, windowh = NULL, normed = TRUE, centered = TRUE, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL)

Arguments

  • xf: object of class "folder" or data.frame.

    • If it is an object of class "folder", its elements are data frames with pp numeric columns. If there are non numeric columns, there is an error. The ttht^{th} element (t=1,,Tt = 1, \ldots, T) matches with the ttht^{th} group.
    • If it is a data frame, the column with name given by the group.name argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error.
  • group.name: string.

    • If xf is an object of class "folder", name of the grouping variable in the returned results. The default is groupname = "group".
    • If xf is a data frame, group.name is the name of the column of xf containing the groups.
  • gaussiand: logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.

  • windowh: either a list of TT bandwidths (one per density associated to a group), or a strictly positive number. If windowh = NULL (default), the bandwidths are automatically computed. See Details.

  • normed: logical. If TRUE (default), the densities are normed before computing the distances.

  • centered: logical. If TRUE (default), the densities are centered.

  • data.centered: logical. If TRUE (default is FALSE), the data of each group are centered.

  • data.scaled: logical. If TRUE (default is FALSE), the data of each group are centered (even if data.centered = FALSE) and scaled.

  • common.variance: logical. If TRUE (default is FALSE), a common covariance matrix (or correlation matrix if data.scaled = TRUE), computed on the whole data, is used. If FALSE (default), a covariance (or correlation) matrix per group is used.

  • nb.factors: numeric. Number of returned principal scores (default nb.factors = 3).

    Warning: The plot.fpcad and interpret.fpcad functions cannot take into account more than nb.factors principal factors.

  • nb.values: numerical. Number of returned eigenvalues (default nb.values = 10).

  • sub.title: string. If provided, the subtitle for the graphs.

  • plot.eigen: logical. If TRUE (default), the barplot of the eigenvalues is plotted.

  • plot.score: logical. If TRUE, the graphs of principal scores are plotted. A new graphic device is opened for each pair of principal scores defined by nscore argument.

  • nscore: numeric vector. If plot.score = TRUE, the numbers of the principal scores which are plotted. By default it is equal to nscore = 1:3. Its components cannot be greater than nb.factors.

  • filename: string. Name of the file in which the results are saved. By default (filename = NULL) the results are not saved.

Details

The TT probability densities ftf_t corresponding to the TT groups of individuals are either parametrically estimated (gaussiand = TRUE) or estimated using the Gaussian kernel method (gaussiand = FALSE). In the latter case, the windowh argument provides the list of the bandwidths to use. Notice that in the multivariate case (pp>1) the bandwidths are positive-definite matrices.

If windowh is a numerical value, the matrix bandwidth is of the form hSh S, where SS is either the square root of the covariance matrix (pp>1) or the standard deviation of the estimated density.

If windowh = NULL (default), hh in the above formula is computed using the bandwidth.parameter function.

Returns

Returns an object of class fpcad, that is a list including: - inertia: data frame of the eigenvalues and percentages of inertia.

  • contributions: data frame of the contributions to the first nb.factors principal components.

  • qualities: data frame of the qualities on the first nb.factors principal factors.

  • scores: data frame of the first nb.factors principal scores.

  • norm: vector of the L2L^2 norms of the densities.

  • means: list of the means.

  • variances: list of the covariance matrices.

  • correlations: list of the correlation matrices.

  • skewness: list of the skewness coefficients.

  • kurtosis: list of the kurtosis coefficients.

References

Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.

Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

See Also

print.fpcad , plot.fpcad , interpret.fpcad , bandwidth.parameter

Examples

data(roses) # Case of a normed non-centred PCA of Gaussian densities (on 3 architectural # characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym)) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result3 <- fpcad(rosesf, group.name = "rose") print(result3) plot(result3) # Applied to a data frame: result3df <- fpcad(roses[,c("Sha","Den","Sym","rose")], group.name = "rose") print(result3df) plot(result3df) # Flower colors of the roses scores <- result3$scores scores <- data.frame(scores, color = scores$rose, stringsAsFactors = TRUE) colours <- scores$rose colours <- factor(c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow")) levels(scores$color) <- c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow") # Scores according to the first two principal components, per color plot(result3, nscore = 1:2, color = colours)
  • Maintainer: Pierre Santagostini
  • License: GPL (>= 2)
  • Last published: 2024-11-22