sgPLSda function

Sparse Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)

Sparse Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)

Function to perform sparse group Partial Least Squares to classify samples (supervised analysis) and select variables. latin1

sgPLSda(X, Y, ncomp = 2, keepX = rep(ncol(X), ncomp), max.iter = 500, tol = 1e-06, ind.block.x, alpha.x, upper.lambda = 10 ^ 5)

Arguments

  • X: numeric matrix of predictors. NAs are allowed.
  • Y: a factor or a class vector for the discrete outcome.
  • ncomp: the number of components to include in the model (see Details).
  • keepX: numeric vector of length ncomp, the number of variables to keep in XX-loadings. By default all variables are kept in the model.
  • max.iter: integer, the maximum number of iterations.
  • tol: a positive real, the tolerance used in the iterative algorithm.
  • ind.block.x: a vector of integers describing the grouping of the XX-variables. (see an example in Details section)
  • alpha.x: The mixing parameter (value between 0 and 1) related to the sparsity within group for the XX dataset.
  • upper.lambda: By default upper.lambda=10 ^ 5. A large value specifying the upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.

Details

sgPLSda function fit sgPLS models with 1,,1, \ldots ,ncomp components to the factor or class vector Y. The appropriate indicator (dummy) matrix is created.

ind.block.x <- c(3,10,15) means that XX is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to Xpp where pp is the number of variables in the XX matrix.

Returns

sPLSda returns an object of class "sPLSda", a list that contains the following components:

  • X: the centered and standardized original predictor matrix.

  • Y: the centered and standardized indicator response vector or matrix.

  • ind.mat: the indicator matrix.

  • ncomp: the number of components included in the model.

  • keepX: number of XX variables kept in the model on each component.

  • mat.c: matrix of coefficients to be used internally by predict.

  • variates: list containing the variates.

  • loadings: list containing the estimated loadings for the X and Y variates.

  • names: list containing the names to be used for individuals and variables.

  • tol: the tolerance used in the iterative algorithm, used for subsequent S3 methods

  • max.iter: the maximum number of iterations, used for subsequent S3 methods

  • iter: Number of iterations of the algorthm for each component

  • ind.block.x: a vector of integers describing the grouping of the X variables.

  • alpha.x: The mixing parameter related to the sparsity within group for the XX dataset.

  • upper.lambda: The upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.

References

Liquet Benoit, Lafaye de Micheaux Pierre , Hejblum Boris, Thiebaut Rodolphe (2016). A group and Sparse Group Partial Least Square approach applied in Genomics context. Bioinformatics.

On sPLS-DA: Le Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12 :253.

Author(s)

Benoit Liquet and Pierre Lafaye de Micheaux.

See Also

sPLS, summary, plotIndiv, plotVar, cim, network, predict, perf and http://www.mixOmics.org for more details.

Examples

data(simuData) X <- simuData$X Y <- simuData$Y ind.block.x <- seq(100, 900, 100) ind.block.x[2] <- 250 #To add some noise in the second group model <- sgPLSda(X, Y, ncomp = 3,ind.block.x=ind.block.x, keepX = c(2, 2, 2) , alpha.x = c(0.5,0.5,0.99)) result.sgPLSda <- select.sgpls(model) result.sgPLSda$group.size.X ##perf(model,criterion="all",validation="loo") -> res ##res$error.rate
  • Maintainer: Benoit Liquet
  • License: GPL (>= 2.0)
  • Last published: 2023-10-05

Useful links