Linear Optimal Low-Rank Projection
Partial Least-Squares (PLS)
Random Projections (RP)
Stacked Cigar
Cross
Fat Tails Simulation
Mean Difference Simulation
Nearest Centroid Classifier Training
Random Classifier Utility
Randomly Chance Classifier Training
Randomly Guessing Classifier Training
Embedding
Bayes Optimal
Data Piling
Linear Optimal Low-Rank Projection (LOL)
Low-rank Canonical Correlation Analysis (LR-CCA)
Low-Rank Linear Discriminant Analysis (LRLDA)
Principal Component Analysis (PCA)
Quadratic Discriminant Toeplitz Simulation
Random Rotation
Reverse Random Trunk
Sample Random Rotation
Random Trunk
GMM Simulate
Toeplitz Simulation
Xor Problem
A utility to use irlba when necessary
A function that performs a utility computation of information about th...
A function that performs basic utilities about the data.
A function for one-hot encoding categorical respose vectors.
Embedding Cross Validation
Optimal Cross-Validated Number of Embedding Dimensions
Cross-Validation Data Splitter
Nearest Centroid Classifier Prediction
Randomly Chance Classifier Prediction
Randomly Guessing Classifier Prediction
Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.