pls function

Partial Least Squares regression

Partial Least Squares regression

pls is used to calibrate, validate and use of partial least squares (PLS) regression model.

pls( x, y, ncomp = min(nrow(x) - 1, ncol(x), 20), center = TRUE, scale = FALSE, cv = NULL, exclcols = NULL, exclrows = NULL, x.test = NULL, y.test = NULL, method = "simpls", info = "", ncomp.selcrit = "min", lim.type = "ddmoments", alpha = 0.05, gamma = 0.01, cv.scope = "local" )

Arguments

  • x: matrix with predictors.
  • y: matrix with responses.
  • ncomp: maximum number of components to calculate.
  • center: logical, center or not predictors and response values.
  • scale: logical, scale (standardize) or not predictors and response values.
  • cv: cross-validation settings (see details).
  • exclcols: columns of x to be excluded from calculations (numbers, names or vector with logical values)
  • exclrows: rows to be excluded from calculations (numbers, names or vector with logical values)
  • x.test: matrix with predictors for test set.
  • y.test: matrix with responses for test set.
  • method: algorithm for computing PLS model (only 'simpls' is supported so far)
  • info: short text with information about the model.
  • ncomp.selcrit: criterion for selecting optimal number of components ('min' for first local minimum of RMSECV and 'wold' for Wold's rule.)
  • lim.type: which method to use for calculation of critical limits for residual distances (see details)
  • alpha: significance level for extreme limits for T2 and Q disances.
  • gamma: significance level for outlier limits for T2 and Q distances.
  • cv.scope: scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set.

Returns

Returns an object of pls class with following fields: - ncomp: number of components included to the model.

  • ncomp.selected: selected (optimal) number of components.

  • xcenter: vector with values used to center the predictors (x).

  • ycenter: vector with values used to center the responses (y).

  • xscale: vector with values used to scale the predictors (x).

  • yscale: vector with values used to scale the responses (y).

  • xloadings: matrix with loading values for x decomposition.

  • yloadings: matrix with loading values for y decomposition.

  • xeigenvals: vector with eigenvalues of components (variance of x-scores).

  • yeigenvals: vector with eigenvalues of components (variance of y-scores).

  • weights: matrix with PLS weights.

  • coeffs: object of class regcoeffs with regression coefficients calculated for each component.

  • info: information about the model, provided by user when build the model.

  • cv: information cross-validation method used (if any).

  • res: a list with result objects (e.g. calibration, cv, etc.)

Details

So far only SIMPLS method [1] is available. Implementation works both with one and multiple response variables.

Like in pca, pls uses number of components (ncomp) as a minimum of number of objects - 1, number of x variables and the default or provided value. Regression coefficients, predictions and other results are calculated for each set of components from 1 to ncomp: 1, 1:2, 1:3, etc. The optimal number of components, (ncomp.selected), is found using first local minumum, but can be also forced to user defined value using function (selectCompNum.pls). The selected optimal number of components is used for all default operations - predictions, plots, etc.

Cross-validation settings, cv, can be a number or a list. If cv is a number, it will be used as a number of segments for random cross-validation (if cv = 1, full cross-validation will be preformed). If it is a list, the following syntax can be used: cv = list("rand", nseg, nrep) for random repeated cross-validation with nseg

segments and nrep repetitions or cv = list("ven", nseg) for systematic splits to nseg segments ('venetian blinds').

Calculation of confidence intervals and p-values for regression coefficients can by done based on Jack-Knifing resampling. This is done automatically if cross-validation is used. However it is recommended to use at least 10 segments for stable JK result. See help for regcoeffs objects for more details.

Examples

### Examples of using PLS model class library(mdatools) ## 1. Make a PLS model for concentration of first component ## using full-cross validation and automatic detection of ## optimal number of components and show an overview data(simdata) x = simdata$spectra.c y = simdata$conc.c[, 1] model = pls(x, y, ncomp = 8, cv = 1) summary(model) plot(model) ## 2. Make a PLS model for concentration of first component ## using test set and 10 segment cross-validation and show overview data(simdata) x = simdata$spectra.c y = simdata$conc.c[, 1] x.t = simdata$spectra.t y.t = simdata$conc.t[, 1] model = pls(x, y, ncomp = 8, cv = 10, x.test = x.t, y.test = y.t) model = selectCompNum(model, 2) summary(model) plot(model) ## 3. Make a PLS model for concentration of first component ## using only test set validation and show overview data(simdata) x = simdata$spectra.c y = simdata$conc.c[, 1] x.t = simdata$spectra.t y.t = simdata$conc.t[, 1] model = pls(x, y, ncomp = 6, x.test = x.t, y.test = y.t) model = selectCompNum(model, 2) summary(model) plot(model) ## 4. Show variance and error plots for a PLS model par(mfrow = c(2, 2)) plotXCumVariance(model, type = 'h') plotYCumVariance(model, type = 'b', show.labels = TRUE, legend.position = 'bottomright') plotRMSE(model) plotRMSE(model, type = 'h', show.labels = TRUE) par(mfrow = c(1, 1)) ## 5. Show scores plots for a PLS model par(mfrow = c(2, 2)) plotXScores(model) plotXScores(model, comp = c(1, 3), show.labels = TRUE) plotXYScores(model) plotXYScores(model, comp = 2, show.labels = TRUE) par(mfrow = c(1, 1)) ## 6. Show loadings and coefficients plots for a PLS model par(mfrow = c(2, 2)) plotXLoadings(model) plotXLoadings(model, comp = c(1, 2), type = 'l') plotXYLoadings(model, comp = c(1, 2), legend.position = 'topleft') plotRegcoeffs(model) par(mfrow = c(1, 1)) ## 7. Show predictions and residuals plots for a PLS model par(mfrow = c(2, 2)) plotXResiduals(model, show.label = TRUE) plotYResiduals(model, show.label = TRUE) plotPredictions(model) plotPredictions(model, ncomp = 4, xlab = 'C, reference', ylab = 'C, predictions') par(mfrow = c(1, 1)) ## 8. Selectivity ratio and VIP scores plots par(mfrow = c(2, 2)) plotSelectivityRatio(model) plotSelectivityRatio(model, ncomp = 1) par(mfrow = c(1, 1)) ## 9. Variable selection with selectivity ratio selratio = getSelectivityRatio(model) selvar = !(selratio < 8) xsel = x[, selvar] modelsel = pls(xsel, y, ncomp = 6, cv = 1) modelsel = selectCompNum(modelsel, 3) summary(model) summary(modelsel) ## 10. Calculate average spectrum and show the selected variables i = 1:ncol(x) ms = apply(x, 2, mean) par(mfrow = c(2, 2)) plot(i, ms, type = 'p', pch = 16, col = 'red', main = 'Original variables') plotPredictions(model) plot(i, ms, type = 'p', pch = 16, col = 'lightgray', main = 'Selected variables') points(i[selvar], ms[selvar], col = 'red', pch = 16) plotPredictions(modelsel) par(mfrow = c(1, 1))

References

  1. S. de Jong, Chemometrics and Intelligent Laboratory Systems 18 (1993) 251-263. 2. Tarja Rajalahti et al. Chemometrics and Laboratory Systems, 95 (2009), 35-48. 3. Il-Gyo Chong, Chi-Hyuck Jun. Chemometrics and Laboratory Systems, 78 (2005), 103-112.

See Also

Main methods for pls objects:

printprints information about a pls object.
summary.plsshows performance statistics for the model.
plot.plsshows plot overview of the model.
pls.simplsimplementation of SIMPLS algorithm.
predict.plsapplies PLS model to a new data.
selectCompNum.plsset number of optimal components in the model.
setDistanceLimits.plsallows to change parameters for critical limits.
categorize.plscategorize data rows similar to categorize.pca .
selratiocomputes matrix with selectivity ratio values.
vipscorescomputes matrix with VIP scores values.

Plotting methods for pls objects:

plotXScores.plsshows scores plot for x decomposition.
plotXYScores.plsshows scores plot for x and y decomposition.
plotXLoadings.plsshows loadings plot for x decomposition.
plotXYLoadings.plsshows loadings plot for x and y decomposition.
plotXVariance.plsshows explained variance plot for x decomposition.
plotYVariance.plsshows explained variance plot for y decomposition.
plotXCumVariance.plsshows cumulative explained variance plot for y decomposition.
plotYCumVariance.plsshows cumulative explained variance plot for y decomposition.
plotXResiduals.plsshows distance/residuals plot for x decomposition.
plotXYResiduals.plsshows joint distance plot for x and y decomposition.
plotWeights.plsshows plot with weights.
plotSelectivityRatio.plsshows plot with selectivity ratio values.
plotVIPScores.plsshows plot with VIP scores values.

Methods inherited from regmodel object (parent class for pls):

plotPredictions.regmodelshows predicted vs. measured plot.
plotRMSE.regmodelshows RMSE plot.
plotRMSERatio.regmodelshows plot for ratio RMSECV/RMSEC values.
plotYResiduals.regmodelshows residuals plot for y values.
getRegcoeffs.regmodelreturns matrix with regression coefficients.

Most of the methods for plotting data (except loadings and regression coefficients) are also available for PLS results (plsres) objects. There is also a randomization test for PLS-regression (randtest) and implementation of interval PLS algorithm for variable selection (ipls)

Author(s)

Sergey Kucheryavskiy (svkucheryavski@gmail.com)

  • Maintainer: Sergey Kucheryavskiy
  • License: MIT + file LICENSE
  • Last published: 2024-08-19