glob.ncomp: maximum number of components for a global PLS model.
center: logical, center or not the data values.
scale: logical, standardize or not the data values.
cv: cross-validation settings (see details).
exclcols: columns of x to be excluded from calculations (numbers, names or vector with logical values).
exclrows: rows to be excluded from calculations (numbers, names or vector with logical values).
int.ncomp: maximum number of components for interval PLS models.
int.num: number of intervals.
int.width: width of intervals.
int.limits: a two column matrix with manual intervals specification.
int.niter: maximum number of iterations (if NULL it will be the smallest of two values: number of intervals and 30).
ncomp.selcrit: criterion for selecting optimal number of components ('min' for minimum of RMSECV).
method: iPLS method ('forward' or 'backward').
x.test: matrix with predictors for test set (by default is NULL, if specified, is used instead of cv).
y.test: matrix with responses for test set.
silent: logical, show or not information about selection process.
full: logical, if TRUE the procedure will continue even if no improvements is observed.
cv.scope: scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set.
Returns
object of 'ipls' class with several fields, including: - var.selected: a vector with indices of selected variables
int.selected: a vector with indices of selected intervals
int.num: total number of intervals
int.width: width of the intervals
int.limits: a matrix with limits for each interval
int.stat: a data frame with statistics for the selection algorithm
glob.stat: a data frame with statistics for the first step (individual intervals)
gm: global PLS model with all variables included
om: optimized PLS model with selected variables
Details
The algorithm splits the predictors into several intervals and tries to find a combination of the intervals, which gives best prediction performance. There are two selection methods: "forward" when the intervals are successively included, and "backward" when the intervals are successively excluded from a model. On the first step the algorithm finds the best (forward) or the worst (backward) individual interval. Then it tests the others to find the one which gives the best model in a combination with the already selected/excluded one. The procedure continues until no improvements is observed or the maximum number of iteration is reached.
There are several ways to specify the intervals. First of all either number of intervals (int.num) or width of the intervals (int.width) can be provided. Alternatively one can specify the limits (first and last variable number) of the intervals manually with int.limits.
Cross-validation settings, cv, can be a number or a list. If cv is a number, it will be used as a number of segments for random cross-validation (if cv = 1, full cross-validation will be preformed). If it is a list, the following syntax can be used: cv = list('rand', nseg, nrep) for random repeated cross-validation with nseg
segments and nrep repetitions or cv = list('ven', nseg) for systematic splits to nseg segments ('venetian blinds').
Examples
library(mdatools)## forward selection for simdatadata(simdata)Xc = simdata$spectra.c
yc = simdata$conc.c[,3, drop =FALSE]# run iPLS and show resultsim = ipls(Xc, yc, int.ncomp =5, int.num =10, cv =4, method ="forward")summary(im)plot(im)# show "developing" of RMSECV during the algorithm executionplotRMSE(im)# plot predictions before and after selectionpar(mfrow = c(1,2))plotPredictions(im$gm)plotPredictions(im$om)# show selected intervals on spectral plotind = im$var.selected
mspectrum = apply(Xc,2, mean)plot(simdata$wavelength, mspectrum, type ='l', col ='lightblue')points(simdata$wavelength[ind], mspectrum[ind], pch =16, col ='blue')
References
[1] Lars Noergaard at al. Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl.Spec. 2000; 54: 413-419