center: logical, center or not predictors and response values.
scale: logical, scale (standardize) or not predictors and response values.
cv: cross-validation settings (see details).
exclcols: columns of x to be excluded from calculations (numbers, names or vector with logical values)
exclrows: rows to be excluded from calculations (numbers, names or vector with logical values)
x.test: matrix with predictors for test set.
y.test: matrix with responses for test set.
method: algorithm for computing PLS model (only 'simpls' is supported so far)
info: short text with information about the model.
ncomp.selcrit: criterion for selecting optimal number of components ('min' for first local minimum of RMSECV and 'wold' for Wold's rule.)
lim.type: which method to use for calculation of critical limits for residual distances (see details)
alpha: significance level for extreme limits for T2 and Q disances.
gamma: significance level for outlier limits for T2 and Q distances.
cv.scope: scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set.
Returns
Returns an object of pls class with following fields: - ncomp: number of components included to the model.
ncomp.selected: selected (optimal) number of components.
xcenter: vector with values used to center the predictors (x).
ycenter: vector with values used to center the responses (y).
xscale: vector with values used to scale the predictors (x).
yscale: vector with values used to scale the responses (y).
xloadings: matrix with loading values for x decomposition.
yloadings: matrix with loading values for y decomposition.
xeigenvals: vector with eigenvalues of components (variance of x-scores).
yeigenvals: vector with eigenvalues of components (variance of y-scores).
weights: matrix with PLS weights.
coeffs: object of class regcoeffs with regression coefficients calculated for each component.
info: information about the model, provided by user when build the model.
cv: information cross-validation method used (if any).
res: a list with result objects (e.g. calibration, cv, etc.)
Details
So far only SIMPLS method [1] is available. Implementation works both with one and multiple response variables.
Like in pca, pls uses number of components (ncomp) as a minimum of number of objects - 1, number of x variables and the default or provided value. Regression coefficients, predictions and other results are calculated for each set of components from 1 to ncomp: 1, 1:2, 1:3, etc. The optimal number of components, (ncomp.selected), is found using first local minumum, but can be also forced to user defined value using function (selectCompNum.pls). The selected optimal number of components is used for all default operations - predictions, plots, etc.
Cross-validation settings, cv, can be a number or a list. If cv is a number, it will be used as a number of segments for random cross-validation (if cv = 1, full cross-validation will be preformed). If it is a list, the following syntax can be used: cv = list("rand", nseg, nrep) for random repeated cross-validation with nseg
segments and nrep repetitions or cv = list("ven", nseg) for systematic splits to nseg segments ('venetian blinds').
Calculation of confidence intervals and p-values for regression coefficients can by done based on Jack-Knifing resampling. This is done automatically if cross-validation is used. However it is recommended to use at least 10 segments for stable JK result. See help for regcoeffs objects for more details.
Examples
### Examples of using PLS model classlibrary(mdatools)## 1. Make a PLS model for concentration of first component## using full-cross validation and automatic detection of## optimal number of components and show an overviewdata(simdata)x = simdata$spectra.c
y = simdata$conc.c[,1]model = pls(x, y, ncomp =8, cv =1)summary(model)plot(model)## 2. Make a PLS model for concentration of first component## using test set and 10 segment cross-validation and show overviewdata(simdata)x = simdata$spectra.c
y = simdata$conc.c[,1]x.t = simdata$spectra.t
y.t = simdata$conc.t[,1]model = pls(x, y, ncomp =8, cv =10, x.test = x.t, y.test = y.t)model = selectCompNum(model,2)summary(model)plot(model)## 3. Make a PLS model for concentration of first component## using only test set validation and show overviewdata(simdata)x = simdata$spectra.c
y = simdata$conc.c[,1]x.t = simdata$spectra.t
y.t = simdata$conc.t[,1]model = pls(x, y, ncomp =6, x.test = x.t, y.test = y.t)model = selectCompNum(model,2)summary(model)plot(model)## 4. Show variance and error plots for a PLS modelpar(mfrow = c(2,2))plotXCumVariance(model, type ='h')plotYCumVariance(model, type ='b', show.labels =TRUE, legend.position ='bottomright')plotRMSE(model)plotRMSE(model, type ='h', show.labels =TRUE)par(mfrow = c(1,1))## 5. Show scores plots for a PLS modelpar(mfrow = c(2,2))plotXScores(model)plotXScores(model, comp = c(1,3), show.labels =TRUE)plotXYScores(model)plotXYScores(model, comp =2, show.labels =TRUE)par(mfrow = c(1,1))## 6. Show loadings and coefficients plots for a PLS modelpar(mfrow = c(2,2))plotXLoadings(model)plotXLoadings(model, comp = c(1,2), type ='l')plotXYLoadings(model, comp = c(1,2), legend.position ='topleft')plotRegcoeffs(model)par(mfrow = c(1,1))## 7. Show predictions and residuals plots for a PLS modelpar(mfrow = c(2,2))plotXResiduals(model, show.label =TRUE)plotYResiduals(model, show.label =TRUE)plotPredictions(model)plotPredictions(model, ncomp =4, xlab ='C, reference', ylab ='C, predictions')par(mfrow = c(1,1))## 8. Selectivity ratio and VIP scores plotspar(mfrow = c(2,2))plotSelectivityRatio(model)plotSelectivityRatio(model, ncomp =1)par(mfrow = c(1,1))## 9. Variable selection with selectivity ratioselratio = getSelectivityRatio(model)selvar =!(selratio <8)xsel = x[, selvar]modelsel = pls(xsel, y, ncomp =6, cv =1)modelsel = selectCompNum(modelsel,3)summary(model)summary(modelsel)## 10. Calculate average spectrum and show the selected variablesi =1:ncol(x)ms = apply(x,2, mean)par(mfrow = c(2,2))plot(i, ms, type ='p', pch =16, col ='red', main ='Original variables')plotPredictions(model)plot(i, ms, type ='p', pch =16, col ='lightgray', main ='Selected variables')points(i[selvar], ms[selvar], col ='red', pch =16)plotPredictions(modelsel)par(mfrow = c(1,1))
References
S. de Jong, Chemometrics and Intelligent Laboratory Systems 18 (1993) 251-263. 2. Tarja Rajalahti et al. Chemometrics and Laboratory Systems, 95 (2009), 35-48. 3. Il-Gyo Chong, Chi-Hyuck Jun. Chemometrics and Laboratory Systems, 78 (2005), 103-112.
See Also
Main methods for pls objects:
print
prints information about a pls object.
summary.pls
shows performance statistics for the model.
plot.pls
shows plot overview of the model.
pls.simpls
implementation of SIMPLS algorithm.
predict.pls
applies PLS model to a new data.
selectCompNum.pls
set number of optimal components in the model.
setDistanceLimits.pls
allows to change parameters for critical limits.
categorize.pls
categorize data rows similar to categorize.pca .
selratio
computes matrix with selectivity ratio values.
vipscores
computes matrix with VIP scores values.
Plotting methods for pls objects:
plotXScores.pls
shows scores plot for x decomposition.
plotXYScores.pls
shows scores plot for x and y decomposition.
plotXLoadings.pls
shows loadings plot for x decomposition.
plotXYLoadings.pls
shows loadings plot for x and y decomposition.
plotXVariance.pls
shows explained variance plot for x decomposition.
plotYVariance.pls
shows explained variance plot for y decomposition.
plotXCumVariance.pls
shows cumulative explained variance plot for y decomposition.
plotYCumVariance.pls
shows cumulative explained variance plot for y decomposition.
plotXResiduals.pls
shows distance/residuals plot for x decomposition.
plotXYResiduals.pls
shows joint distance plot for x and y decomposition.
plotWeights.pls
shows plot with weights.
plotSelectivityRatio.pls
shows plot with selectivity ratio values.
plotVIPScores.pls
shows plot with VIP scores values.
Methods inherited from regmodel object (parent class for pls):
plotPredictions.regmodel
shows predicted vs. measured plot.
plotRMSE.regmodel
shows RMSE plot.
plotRMSERatio.regmodel
shows plot for ratio RMSECV/RMSEC values.
plotYResiduals.regmodel
shows residuals plot for y values.
getRegcoeffs.regmodel
returns matrix with regression coefficients.
Most of the methods for plotting data (except loadings and regression coefficients) are also
available for PLS results (plsres) objects. There is also a randomization test
for PLS-regression (randtest) and implementation of interval PLS algorithm
for variable selection (ipls)