The Robust Accurate, Direct ICA ALgorithm (RADICAL)
The Robust Accurate, Direct ICA ALgorithm (RADICAL)
The ICA algorithm of Learned-Miller (2003), is based on an efficient entropy estimator (due to Vasicek (1976)) which is robust to outliers and requires no strong characterization assumptions about the data generating process.
radical( X, components = NCOL(X), demean =TRUE, pca_cov = c("ML","LW","EWMA"), k =150, augment =FALSE, replications =30, sigma =0.175, first_eigen =NULL, last_eigen =NULL, E =NULL, D =NULL, Z =NULL, K =NULL, L =NULL, seed =NULL, trace =FALSE,...)
Arguments
X: the data matrix (n x m) where n is the number of samples and m is the number of signals.
components: the number of independent components to extract.
demean: whether to demean the data.
pca_cov: the method used to calculate the covariance matrix for PCA. Options are ML (maximum likelihood), LW (Ledoit-Wolf) shrinkage method and EWMA (exponentially weighted moving average).
k: the number of angles at which to evaluate the contrast function. The contrast function will be evaluated at K evenly spaced rotations from -Pi/4 to Pi/4.
augment: whether to augment the data (as explained in paper). For large datasets of >10,000 points this should be set to FALSE.
replications: the number of replicated points for each original point if augment is set to TRUE. The larger the number of points in the data set, the smaller this value can be. For data sets of 10,000 points or more, point replication should be de-activated by setting augment to FALSE.
sigma: the standard deviation (noise) of the replicated points when using the augmentation option (which sets the standard deviation for the random normal generator).
first_eigen: This and last_eigen specify the range for eigenvalues that are retained, first_eigen is the index of largest eigenvalue to be retained. Making use of this option overwrites components.
last_eigen: the index of the last (smallest) eigenvalue to be retained and overwrites component argument.
E: (optional) Eigen vector from the PCA decomposition. If this is provided then D must also be provided.
D: (optional) Eigen values from the PCA decomposition.
Z: (optional) provided whitened signal matrix. If this is provided then both K and L must also be provided.
K: (optional) whitening matrix.
L: (optional) de-whitening matrix.
seed: the random seed for the random number generator.
trace: whether to print out progress information.
...: additional arguments to be passed to the covariance matrix calculation. For arguments passed to the EWMA method, it optionally takes an additional argument lambda which is the exponential decay parameter (default is 0.96). The LW takes an additional argument shrink which is the shrinkage parameter (default is to calculate this).
Returns
A list with the following components: - A: the mixing matrix
W: the unmixing matrix
S: the independent components
U: the rotation matrix
K: the whitening matrix
L: the dewhitening matrix
C: the covariance matrix
Z: the whitened signal
mu: the mean of the mixed signal (X)
elapsed: the time taken to run the algorithm
Details
Steps to the general algorithm are as follows (see P.1284 of Learned-Miller (2003) for specific details of RADICAL implementation):
Demean the data if required: M=X−μ
Calculate the covariance matrix Σ using one of the methods provided.
Use an eigen decomposition to calculate the eigenvalues and eigenvectors of the covariance matrix: Σ=EDE′
Select the range of eigenvalues to retain (dimensionality reduction).
Calculate the whitening matrix K=D−1/2E′ and the dewhitening matrix L=ED1/2.
Whiten the data: Z=MK′. Unwhitening is done by M=ZL′.
Run the RADICAL algorithm to calculate the rotation matrix U, the mixing matrix: A=U−1L and the unmixing matrix W=K′U.
Calculate the independent components: S=MW+1μW where 1 is a matrix of ones with dimension (samples x 1).
Notice that in calculating the mixing (A) and unmixing (W) matrices we have combined the whitening (K) and un-whitening (L) matrices with the rotation matrix U.
Note
Replications carries a significant computational cost. The algorithm is written in C++ and uses RcppParallel for the most expensive calculations. See setThreadOptions for setting the number of threads to use.