Function to perform group Partial Least Squares (gPLS) in the context of two datasets which are both divided into groups of variables. The gPLS approach aims to select only a few groups of variables from one dataset which are linearly related to a few groups of variables of the second dataset.
latin1
gPLS(X, Y, ncomp, mode ="regression", max.iter =500, tol =1e-06, keepX, keepY =NULL, ind.block.x, ind.block.y =NULL,scale=TRUE)
Arguments
X: numeric matrix of predictors.
Y: numeric vector or matrix of responses (for multi-response models).
ncomp: the number of components to include in the model (see Details).
mode: character string. What type of algorithm to use, (partially) matching one of "regression" or "canonical". See Details.
max.iter: integer, the maximum number of iterations.
tol: a positive real, the tolerance used in the iterative algorithm.
keepX: numeric vector of length ncomp, the number of variables to keep in X-loadings. By default all variables are kept in the model.
keepY: numeric vector of length ncomp, the number of variables to keep in Y-loadings. By default all variables are kept in the model.
ind.block.x: a vector of integers describing the grouping of the X-variables. (see an example in Details section)
ind.block.y: a vector of consecutive integers describing the grouping of the Y-variables (see an example in Details section)
scale: a logical indicating if the orignal data set need to be scaled. By default scale=TRUE
Details
gPLS function fits gPLS models with 1,…,ncomp components. Multi-response models are fully supported.
The type of algorithm to use is specified with the mode argument. Two gPLS algorithms are available: gPLS regression ("regression") and gPLS canonical analysis ("canonical") (see References).
ind.block.x <- c(3,10,15) means that X is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to Xp where p is the number of variables in the X matrix.
Returns
gPLS returns an object of class "gPLS", a list that contains the following components:
X: the centered and standardized original predictor matrix.
Y: the centered and standardized original response vector or matrix.
ncomp: the number of components included in the model.
mode: the algorithm used to fit the model.
keepX: number of X variables kept in the model on each component.
keepY: number of Y variables kept in the model on each component.
mat.c: matrix of coefficients to be used internally by predict.
variates: list containing the variates.
loadings: list containing the estimated loadings for the X and Y variates.
names: list containing the names to be used for individuals and variables.
tol: the tolerance used in the iterative algorithm, used for subsequent S3 methods.
max.iter: the maximum number of iterations, used for subsequent S3 methods.
iter: vector containing the number of iterations for convergence in each component.
ind.block.x: a vector of integers describing the grouping of the X variables.
ind.block.y: a vector of consecutive integers describing the grouping of the Y variables.
References
Liquet Benoit, Lafaye de Micheaux Pierre , Hejblum Boris, Thiebaut Rodolphe. A group and Sparse Group Partial Least Square approach applied in Genomics context. Submitted.
Le Cao, K.-A., Martin, P.G.P., Robert-Grani'e, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10 :34.
Le Cao, K.-A., Rossouw, D., Robert-Grani'e, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7 , article 35.
Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99 , 1015-1034.
Tenenhaus, M. (1998). La r'egression PLS: th'eorie et pratique. Paris: Editions Technic.
Wold H. (1966). Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P. R. (editors), Multivariate Analysis. Academic Press, N.Y., 391-420.
Author(s)
Benoit Liquet and Pierre Lafaye de Micheaux.
See Also
sPLS, sgPLS, predict, perf, cim and functions from mixOmics package: summary, plotIndiv, plotVar, plot3dIndiv, plot3dVar.