PCAmix function

Principal component analysis of mixed data

Principal component analysis of mixed data

Performs principal component analysis of a set of individuals (observations) described by a mixture of qualitative and quantitative variables. PCAmix includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases.

PCAmix(X.quanti = NULL, X.quali = NULL, ndim = 5, rename.level = FALSE, weight.col.quanti = NULL, weight.col.quali = NULL, graph = TRUE)

Arguments

  • X.quanti: a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
  • X.quali: a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns).
  • ndim: number of dimensions kept in the results (by default 5).
  • rename.level: boolean, if TRUE all the levels of the qualitative variables are renamed as follows: "variable_name=level_name". This prevents to have identical names of the levels.
  • weight.col.quanti: vector of weights for the quantitative variables.
  • weight.col.quali: vector of the weights for the qualitative variables.
  • graph: boolean, if TRUE the following graphics are displayed for the first two dimensions of PCAmix: component map of the individuals, plot of the squared loadings of all the variables (quantitative and qualitative), plot of the correlation circle (if quantitative variables are available), component map of the levels (if qualitative variables are available).

Returns

  • eig: a matrix containing the eigenvalues, the percentages of variance and the cumulative percentages of variance.

  • ind: a list containing the results for the individuals (observations):

    • $coord: factor coordinates (scores) of the individuals,
    • $contrib: absolute contributions of the individuals,
    • $contrib.pct: relative contributions of the individuals,
    • $cos2: squared cosinus of the individuals.
  • quanti: a list containing the results for the quantitative variables:

    • $coord: factor coordinates (scores) of the quantitative variables,
    • $contrib: absolute contributions of the quantitative variables,
    • $contrib.pct: relative contributions of the quantitative variables (in percentage),
    • $cos2: squared cosinus of the quantitative variables.
  • levels: a list containing the results for the levels of the qualitative variables:

    • $coord: factor coordinates (scores) of the levels,
    • $contrib: absolute contributions of the levels,
    • $contrib.pct: relative contributions of the levels (in percentage),
    • $cos2: squared cosinus of the levels.
  • quali: a list containing the results for the qualitative variables:

    • $contrib: absolute contributions of the qualitative variables (sum of absolute contributions of the levels of the qualitative variable),
    • $contrib.pct: relative contributions (in percentage) of the qualitative variables (sum of relative contributions of the levels of the qualitative variable).
  • sqload: a matrix of dimension (p, ndim) containing the squared loadings of the quantitative and qualitative variables.

  • coef: the coefficients of the linear combinations used to construct the principal components of PCAmix, and to predict coordinates (scores) of new observations in the function predict.PCAmix.

  • M: the vector of the weights of the columns used in the Generalized Singular Value Decomposition.

Details

If X.quali is not specified (i.e. NULL), only quantitative variables are available and standard PCA is performed. If X.quanti is NULL, only qualitative variables are available and standard MCA is performed.

Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.

PCAmix performs squared loadings in (sqload). Squared loadings for a qualitative variable are correlation ratios between the variable and the principal components. For a quantitative variable, squared loadings are the squared correlations between the variable and the principal components.

Note that when all the p variables are qualitative, the factor coordinates (scores) of the n observations are equal to the factor coordinates (scores) of standard MCA times square root of p and the eigenvalues are then equal to the usual eigenvalues of MCA times p. When all the variables are quantitative, PCAmix gives exactly the same results as standard PCA.

Examples

#PCAMIX: data(wine) str(wine) X.quanti <- splitmix(wine)$X.quanti X.quali <- splitmix(wine)$X.quali pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4) pca<-PCAmix(X.quanti[,1:27],X.quali,ndim=4,graph=FALSE) pca$eig pca$ind$coord #PCA: data(decathlon) quali<-decathlon[,13] pca<-PCAmix(decathlon[,1:10]) pca<-PCAmix(decathlon[,1:10], graph=FALSE) plot(pca,choice="ind",coloring.ind=quali,cex=0.8, posleg="topright",main="Scores") plot(pca, choice="sqload",main="Squared correlations") plot(pca, choice="cor",main="Correlation circle") pca$quanti$coord #MCA data(flower) mca <- PCAmix(X.quali=flower[,1:4],rename.level=TRUE) mca <- PCAmix(X.quali=flower[,1:4],rename.level=TRUE,graph=FALSE) plot(mca,choice="ind",main="Scores") plot(mca,choice="sqload",main="Correlation ratios") plot(mca,choice="levels",main="Levels") mca$levels$coord #Missing values data(vnf) PCAmix(X.quali=vnf,rename.level=TRUE) vnf2<-na.omit(vnf) PCAmix(X.quali=vnf2,rename.level=TRUE)

References

Chavent M., Kuentz-Simonet V., Labenne A., Saracco J., Multivariate analysis of mixed data: The PCAmixdata R package, arXiv:1411.4911 [stat.CO].

See Also

print.PCAmix, summary.PCAmix, predict.PCAmix, plot.PCAmix

Author(s)

Marie Chavent marie.chavent@u-bordeaux.fr , Amaury Labenne.

  • Maintainer: Marie Chavent
  • License: GPL (>= 2.0)
  • Last published: 2017-10-23

Useful links