Partial Least Squares Regression Models with Big Matrices
Streamed centering statistics for RKHS kernels
bigPLSR-package
Fast IRLS for binomial logit with class weights
Internal kernel and wide-kernel PLS solver
Finalize pls objects
Internal: resolve training reference for RKHS predictions
Finalize a KF-PLS state into a fitted model
KF-PLS streaming state (constructor)
Update a KF-PLS streaming state with a mini-batch
PLS biplot
Boxplots of bootstrap coefficient distributions
Boxplots of bootstrap score distributions
Plot individual scores
Plot variable loadings
Plot Variable Importance in Projection (VIP)
Bootstrap a PLS model
Cross-validate PLS models
Select components from cross-validation results
Unified PLS fit with auto backend and selectable algorithm
Compute information criteria for component selection
Predict responses from a PLS fit
Predict latent scores from a PLS fit
Component selection via information criteria
Naive sparsity control by coefficient thresholding
Variable importance in projection (VIP) scores
Predict method for big_plsr objects
Print a summary.big_plsr object
Summarise bootstrap estimates
Summarize a big_plsr model
Fast partial least squares (PLS) for dense and out-of-core data. Provides SIMPLS (straightforward implementation of a statistically inspired modification of the PLS method) and NIPALS (non-linear iterative partial least-squares) solvers, plus kernel-style PLS variants ('kernelpls' and 'widekernelpls') with parity to 'pls'. Optimized for 'bigmemory'-backed matrices with streamed cross-products and chunked BLAS (Basic Linear Algebra Subprograms) (XtX/XtY and XXt/YX), optional file-backed score sinks, and deterministic testing helpers. Includes an auto-selection strategy that chooses between XtX SIMPLS, XXt (wide) SIMPLS, and NIPALS based on (n, p) and a configurable memory budget. About the package, Bertrand and Maumy (2023) <https://hal.science/hal-05352069>, and <https://hal.science/hal-05352061> highlighted fitting and cross-validating PLS regression models to big data. For more details about some of the techniques featured in the package, Dayal and MacGregor (1997) <doi:10.1002/(SICI)1099-128X(199701)11:1%3C73::AID-CEM435%3E3.0.CO;2-%23>, Rosipal & Trejo (2001) <https://www.jmlr.org/papers/v2/rosipal01a.html>, Tenenhaus, Viennet, and Saporta (2007) <doi:10.1016/j.csda.2007.01.004>, Rosipal (2004) <doi:10.1007/978-3-540-45167-9_17>, Rosipal (2019) <https://ieeexplore.ieee.org/document/8616346>, Song, Wang, and Bai (2024) <doi:10.1016/j.chemolab.2024.105238>. Includes kernel logistic PLS with 'C++'-accelerated alternating iteratively reweighted least squares (IRLS) updates, streamed reproducing kernel Hilbert space (RKHS) solvers with reusable centering statistics, and bootstrap diagnostics with graphical summaries for coefficients, scores, and cross-validation workflows, alongside dedicated plotting utilities for individuals, variables, ellipses, and biplots. The streaming backend uses far less memory and keeps memory bounded across data sizes. For PLS1, streaming is often fast enough while preserving a small memory footprint; for PLS2 it remains competitive with a bounded footprint. On small problems that fit comfortably in RAM (random-access memory), dense in-memory solvers are slightly faster; the crossover occurs as n or p grow and the Gram/cross-product cost dominates.
Useful links