sgPLSda() R function from [sgPLS]

Sparse Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)

Function to perform sparse group Partial Least Squares to classify samples (supervised analysis) and select variables. latin1


sgPLSda(X, Y, ncomp = 2, keepX = rep(ncol(X), ncomp),
       max.iter = 500, tol = 1e-06, ind.block.x,
     alpha.x, upper.lambda = 10 ^ 5)

Arguments

X: numeric matrix of predictors. NAs are allowed.
Y: a factor or a class vector for the discrete outcome.
ncomp: the number of components to include in the model (see Details).
keepX: numeric vector of length ncomp, the number of variables to keep in $X$ -loadings. By default all variables are kept in the model.
max.iter: integer, the maximum number of iterations.
tol: a positive real, the tolerance used in the iterative algorithm.
ind.block.x: a vector of integers describing the grouping of the $X$ -variables. (see an example in Details section)
alpha.x: The mixing parameter (value between 0 and 1) related to the sparsity within group for the $X$ dataset.
upper.lambda: By default upper.lambda=10 ^ 5. A large value specifying the upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.

Details

sgPLSda function fit sgPLS models with $1, \ldots ,$ ncomp components to the factor or class vector Y. The appropriate indicator (dummy) matrix is created.

ind.block.x <- c(3,10,15) means that $X$ is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to X $p$ where $p$ is the number of variables in the $X$ matrix.

Returns

sPLSda returns an object of class "sPLSda", a list that contains the following components:

X: the centered and standardized original predictor matrix.
Y: the centered and standardized indicator response vector or matrix.
ind.mat: the indicator matrix.
ncomp: the number of components included in the model.
keepX: number of $X$ variables kept in the model on each component.
mat.c: matrix of coefficients to be used internally by predict.
variates: list containing the variates.
loadings: list containing the estimated loadings for the X and Y variates.
names: list containing the names to be used for individuals and variables.
tol: the tolerance used in the iterative algorithm, used for subsequent S3 methods
max.iter: the maximum number of iterations, used for subsequent S3 methods
iter: Number of iterations of the algorthm for each component
ind.block.x: a vector of integers describing the grouping of the X variables.
alpha.x: The mixing parameter related to the sparsity within group for the $X$ dataset.
upper.lambda: The upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.

References

Liquet Benoit, Lafaye de Micheaux Pierre , Hejblum Boris, Thiebaut Rodolphe (2016). A group and Sparse Group Partial Least Square approach applied in Genomics context. Bioinformatics.

On sPLS-DA: Le Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12 :253.

Author(s)

Benoit Liquet and Pierre Lafaye de Micheaux.

Examples


data(simuData)
X <- simuData$X
Y <- simuData$Y
ind.block.x <- seq(100, 900, 100)
ind.block.x[2] <- 250
#To add some noise in the second group
model <- sgPLSda(X, Y, ncomp = 3,ind.block.x=ind.block.x, keepX = c(2, 2, 2)
, alpha.x = c(0.5,0.5,0.99))
result.sgPLSda <- select.sgpls(model)
result.sgPLSda$group.size.X
##perf(model,criterion="all",validation="loo") -> res
##res$error.rate

sgPLS package Read PDF manual

Maintainer: Benoit Liquet
License: GPL (>= 2.0)
Last published: 2023-10-05

Useful links

sgPLSda function