pls() R function from [mdatools]

Partial Least Squares regression

pls is used to calibrate, validate and use of partial least squares (PLS) regression model.


pls(
  x,
  y,
  ncomp = min(nrow(x) - 1, ncol(x), 20),
  center = TRUE,
  scale = FALSE,
  cv = NULL,
  exclcols = NULL,
  exclrows = NULL,
  x.test = NULL,
  y.test = NULL,
  method = "simpls",
  info = "",
  ncomp.selcrit = "min",
  lim.type = "ddmoments",
  alpha = 0.05,
  gamma = 0.01,
  cv.scope = "local"
)

Arguments

x: matrix with predictors.
y: matrix with responses.
ncomp: maximum number of components to calculate.
center: logical, center or not predictors and response values.
scale: logical, scale (standardize) or not predictors and response values.
cv: cross-validation settings (see details).
exclcols: columns of x to be excluded from calculations (numbers, names or vector with logical values)
exclrows: rows to be excluded from calculations (numbers, names or vector with logical values)
x.test: matrix with predictors for test set.
y.test: matrix with responses for test set.
method: algorithm for computing PLS model (only 'simpls' is supported so far)
info: short text with information about the model.
ncomp.selcrit: criterion for selecting optimal number of components ('min' for first local minimum of RMSECV and 'wold' for Wold's rule.)
lim.type: which method to use for calculation of critical limits for residual distances (see details)
alpha: significance level for extreme limits for T2 and Q disances.
gamma: significance level for outlier limits for T2 and Q distances.
cv.scope: scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set.

Returns

Returns an object of pls class with following fields: - ncomp: number of components included to the model.

ncomp.selected: selected (optimal) number of components.
xcenter: vector with values used to center the predictors (x).
ycenter: vector with values used to center the responses (y).
xscale: vector with values used to scale the predictors (x).
yscale: vector with values used to scale the responses (y).
xloadings: matrix with loading values for x decomposition.
yloadings: matrix with loading values for y decomposition.
xeigenvals: vector with eigenvalues of components (variance of x-scores).
yeigenvals: vector with eigenvalues of components (variance of y-scores).
weights: matrix with PLS weights.
coeffs: object of class regcoeffs with regression coefficients calculated for each component.
info: information about the model, provided by user when build the model.
cv: information cross-validation method used (if any).
res: a list with result objects (e.g. calibration, cv, etc.)

Details

So far only SIMPLS method [1] is available. Implementation works both with one and multiple response variables.

Like in pca, pls uses number of components (ncomp) as a minimum of number of objects - 1, number of x variables and the default or provided value. Regression coefficients, predictions and other results are calculated for each set of components from 1 to ncomp: 1, 1:2, 1:3, etc. The optimal number of components, (ncomp.selected), is found using first local minumum, but can be also forced to user defined value using function (selectCompNum.pls). The selected optimal number of components is used for all default operations - predictions, plots, etc.

Cross-validation settings, cv, can be a number or a list. If cv is a number, it will be used as a number of segments for random cross-validation (if cv = 1, full cross-validation will be preformed). If it is a list, the following syntax can be used: cv = list("rand", nseg, nrep) for random repeated cross-validation with nseg

segments and nrep repetitions or cv = list("ven", nseg) for systematic splits to nseg segments ('venetian blinds').

Calculation of confidence intervals and p-values for regression coefficients can by done based on Jack-Knifing resampling. This is done automatically if cross-validation is used. However it is recommended to use at least 10 segments for stable JK result. See help for regcoeffs objects for more details.

Examples


### Examples of using PLS model class
library(mdatools)

## 1. Make a PLS model for concentration of first component
## using full-cross validation and automatic detection of
## optimal number of components and show an overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]

model = pls(x, y, ncomp = 8, cv = 1)
summary(model)
plot(model)

## 2. Make a PLS model for concentration of first component
## using test set and 10 segment cross-validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 8, cv = 10, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 3. Make a PLS model for concentration of first component
## using only test set validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 6, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 4. Show variance and error plots for a PLS model
par(mfrow = c(2, 2))
plotXCumVariance(model, type = 'h')
plotYCumVariance(model, type = 'b', show.labels = TRUE, legend.position = 'bottomright')
plotRMSE(model)
plotRMSE(model, type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 5. Show scores plots for a PLS model
par(mfrow = c(2, 2))
plotXScores(model)
plotXScores(model, comp = c(1, 3), show.labels = TRUE)
plotXYScores(model)
plotXYScores(model, comp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

## 6. Show loadings and coefficients plots for a PLS model
par(mfrow = c(2, 2))
plotXLoadings(model)
plotXLoadings(model, comp = c(1, 2), type = 'l')
plotXYLoadings(model, comp = c(1, 2), legend.position = 'topleft')
plotRegcoeffs(model)
par(mfrow = c(1, 1))

## 7. Show predictions and residuals plots for a PLS model
par(mfrow = c(2, 2))
plotXResiduals(model, show.label = TRUE)
plotYResiduals(model, show.label = TRUE)
plotPredictions(model)
plotPredictions(model, ncomp = 4, xlab = 'C, reference', ylab = 'C, predictions')
par(mfrow = c(1, 1))

## 8. Selectivity ratio and VIP scores plots
par(mfrow = c(2, 2))
plotSelectivityRatio(model)
plotSelectivityRatio(model, ncomp = 1)
par(mfrow = c(1, 1))

## 9. Variable selection with selectivity ratio
selratio = getSelectivityRatio(model)
selvar = !(selratio < 8)

xsel = x[, selvar]
modelsel = pls(xsel, y, ncomp = 6, cv = 1)
modelsel = selectCompNum(modelsel, 3)

summary(model)
summary(modelsel)

## 10. Calculate average spectrum and show the selected variables
i = 1:ncol(x)
ms = apply(x, 2, mean)

par(mfrow = c(2, 2))

plot(i, ms, type = 'p', pch = 16, col = 'red', main = 'Original variables')
plotPredictions(model)

plot(i, ms, type = 'p', pch = 16, col = 'lightgray', main = 'Selected variables')
points(i[selvar], ms[selvar], col = 'red', pch = 16)
plotPredictions(modelsel)

par(mfrow = c(1, 1))

References

S. de Jong, Chemometrics and Intelligent Laboratory Systems 18 (1993) 251-263. 2. Tarja Rajalahti et al. Chemometrics and Laboratory Systems, 95 (2009), 35-48. 3. Il-Gyo Chong, Chi-Hyuck Jun. Chemometrics and Laboratory Systems, 78 (2005), 103-112.


`print`	prints information about a `pls` object.
`summary.pls`	shows performance statistics for the model.
`plot.pls`	shows plot overview of the model.
`pls.simpls`	implementation of SIMPLS algorithm.
`predict.pls`	applies PLS model to a new data.
`selectCompNum.pls`	set number of optimal components in the model.
`setDistanceLimits.pls`	allows to change parameters for critical limits.
`categorize.pls`	categorize data rows similar to `categorize.pca` .
`selratio`	computes matrix with selectivity ratio values.
`vipscores`	computes matrix with VIP scores values.

Plotting methods for pls objects:


`plotXScores.pls`	shows scores plot for x decomposition.
`plotXYScores.pls`	shows scores plot for x and y decomposition.
`plotXLoadings.pls`	shows loadings plot for x decomposition.
`plotXYLoadings.pls`	shows loadings plot for x and y decomposition.
`plotXVariance.pls`	shows explained variance plot for x decomposition.
`plotYVariance.pls`	shows explained variance plot for y decomposition.
`plotXCumVariance.pls`	shows cumulative explained variance plot for y decomposition.
`plotYCumVariance.pls`	shows cumulative explained variance plot for y decomposition.
`plotXResiduals.pls`	shows distance/residuals plot for x decomposition.
`plotXYResiduals.pls`	shows joint distance plot for x and y decomposition.
`plotWeights.pls`	shows plot with weights.
`plotSelectivityRatio.pls`	shows plot with selectivity ratio values.
`plotVIPScores.pls`	shows plot with VIP scores values.

Methods inherited from regmodel object (parent class for pls):


`plotPredictions.regmodel`	shows predicted vs. measured plot.
`plotRMSE.regmodel`	shows RMSE plot.
`plotRMSERatio.regmodel`	shows plot for ratio RMSECV/RMSEC values.
`plotYResiduals.regmodel`	shows residuals plot for y values.
`getRegcoeffs.regmodel`	returns matrix with regression coefficients.

Most of the methods for plotting data (except loadings and regression coefficients) are also available for PLS results (plsres) objects. There is also a randomization test for PLS-regression (randtest) and implementation of interval PLS algorithm for variable selection (ipls)

Author(s)

Sergey Kucheryavskiy (svkucheryavski@gmail.com)

mdatools package Read PDF manual

Maintainer: Sergey Kucheryavskiy
License: MIT + file LICENSE
Last published: 2024-08-19

Useful links

pls function