Sparse Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
Sparse Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
Function to perform sparse group Partial Least Squares to classify samples (supervised analysis) and select variables.
latin1
sgPLSda(X, Y, ncomp =2, keepX = rep(ncol(X), ncomp), max.iter =500, tol =1e-06, ind.block.x, alpha.x, upper.lambda =10^5)
Arguments
X: numeric matrix of predictors. NAs are allowed.
Y: a factor or a class vector for the discrete outcome.
ncomp: the number of components to include in the model (see Details).
keepX: numeric vector of length ncomp, the number of variables to keep in X-loadings. By default all variables are kept in the model.
max.iter: integer, the maximum number of iterations.
tol: a positive real, the tolerance used in the iterative algorithm.
ind.block.x: a vector of integers describing the grouping of the X-variables. (see an example in Details section)
alpha.x: The mixing parameter (value between 0 and 1) related to the sparsity within group for the X dataset.
upper.lambda: By default upper.lambda=10 ^ 5. A large value specifying the upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.
Details
sgPLSda function fit sgPLS models with 1,…,ncomp components to the factor or class vector Y. The appropriate indicator (dummy) matrix is created.
ind.block.x <- c(3,10,15) means that X is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to Xp where p is the number of variables in the X matrix.
Returns
sPLSda returns an object of class "sPLSda", a list that contains the following components:
X: the centered and standardized original predictor matrix.
Y: the centered and standardized indicator response vector or matrix.
ind.mat: the indicator matrix.
ncomp: the number of components included in the model.
keepX: number of X variables kept in the model on each component.
mat.c: matrix of coefficients to be used internally by predict.
variates: list containing the variates.
loadings: list containing the estimated loadings for the X and Y variates.
names: list containing the names to be used for individuals and variables.
tol: the tolerance used in the iterative algorithm, used for subsequent S3 methods
max.iter: the maximum number of iterations, used for subsequent S3 methods
iter: Number of iterations of the algorthm for each component
ind.block.x: a vector of integers describing the grouping of the X variables.
alpha.x: The mixing parameter related to the sparsity within group for the X dataset.
upper.lambda: The upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.
References
Liquet Benoit, Lafaye de Micheaux Pierre , Hejblum Boris, Thiebaut Rodolphe (2016). A group and Sparse Group Partial Least Square approach applied in Genomics context. Bioinformatics.
On sPLS-DA: Le Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12 :253.
Author(s)
Benoit Liquet and Pierre Lafaye de Micheaux.
See Also
sPLS, summary, plotIndiv, plotVar, cim, network, predict, perf and http://www.mixOmics.org for more details.
Examples
data(simuData)X <- simuData$X
Y <- simuData$Y
ind.block.x <- seq(100,900,100)ind.block.x[2]<-250#To add some noise in the second groupmodel <- sgPLSda(X, Y, ncomp =3,ind.block.x=ind.block.x, keepX = c(2,2,2), alpha.x = c(0.5,0.5,0.99))result.sgPLSda <- select.sgpls(model)result.sgPLSda$group.size.X
##perf(model,criterion="all",validation="loo") -> res##res$error.rate