x: An expression count matrix. The rows correspond to genes and the columns correspond to cells. Can be sparse.
do.fast: Approximates the prediction step. Default is TRUE.
ncores: Number of cores to use. Default is 1.
size.factor: Vector of cell size normalization factors. If x is already normalized or normalization is not desired, use size.factor = 1. Default uses mean library size normalization.
npred: Number of genes for regression prediction. Selects the top npred genes in terms of mean expression for regression prediction. Default is all genes.
pred.cells: Indices of cells to perform regression prediction. Default is all cells.
pred.genes: Indices of specific genes to perform regression prediction. Overrides npred. Default is all genes.
pred.genes.only: Return expression levels of only pred.genes. Default is FALSE (returns expression levels of all genes).
null.model: Whether to use mean gene expression as prediction.
mu: Matrix of prior means.
estimates.only: Only return SAVER estimates. Default is FALSE.
Returns
If estimates.only = TRUE, then a matrix of SAVER estimates.
If estimates.only = FALSE, a list with the following components - estimate: Recovered (normalized) expression.
se: Standard error of estimates.
info: Information about dataset.
The info element is a list with the following components: - size.factor: Size factor used for normalization.
maxcor: Maximum absolute correlation for each gene. 2 if not calculated
lambda.max: Smallest value of lambda which gives the null model.
lambda.min: Value of lambda from which the prediction model is used
sd.cv: Difference in the number of standard deviations in deviance between the model with lowest cross-validation error and the null model
pred.time: Time taken to generate predictions.
var.time: Time taken to estimate variance.
maxcor: Maximum absolute correlation cutoff used to determine if a gene should be predicted.
lambda.coefs: Coefficients for estimating lambda with lowest cross-validation error.
total.time: Total time for SAVER estimation.
Details
The SAVER method starts by estimating the prior mean and variance for the true expression level for each gene and cell. The prior mean is obtained through predictions from a LASSO Poisson regression for each gene implemented using the glmnet package. Then, the variance is estimated through maximum likelihood assuming constant variance, Fano factor, or coefficient of variation variance structure for each gene. The posterior distribution is calculated and the posterior mean is reported as the SAVER estimate.
Examples
data("linnarsson")## Not run:system.time(linnarsson_saver <- saver(linnarsson, ncores =12))## End(Not run)# predictions for top 5 highly expressed genes## Not run:saver2 <- saver(linnarsson, npred =5)## End(Not run)# predictions for certain genes## Not run:genes <- c("Thy1","Mbp","Stim2","Psmc6","Rps19")genes.ind <- which(rownames(linnarsson)saver3 <- saver(linnarsson, pred.genes = genes.ind)## End(Not run)# return only certain genes## Not run:saver4 <- saver(linnarsson, pred.genes = genes.ind, pred.genes.only =TRUE)## End(Not run)