Given a matrix of counts (Y) where log2(E[Y])=Q, a design matrix (X), and a matrix of coefficients (B), thin_diff will generate a new matrix of counts such that log2(E[Y])=BX′+u1′+Q, where u is some vector of intercept coefficients. This function is used by all other thinning functions. The method is described in detail in Gerard (2020).
thin_base(mat, designmat, coefmat, relative =TRUE, type = c("thin","mult"))
Arguments
mat: A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples.
designmat: A design matrix. The rows index the samples and the columns index the variables. The intercept should not be included.
coefmat: A matrix of coefficients. The rows index the genes and the columns index the samples.
relative: A logical. Should we apply relative thinning (TRUE) or absolute thinning (FALSE). Only experts should change the default.
type: Should we apply binomial thinning (type = "thin") or just naive multiplication of the counts (type = "mult"). You should always have this set to "thin".
Returns
A matrix of new RNA-seq read-counts. This matrix has the signal added from designmat and coefmat.
Examples
## Simulate data from given matrix of counts## In practice, you would obtain Y from a real dataset, not simulate it.set.seed(1)nsamp <-10ngene <-1000Y <- matrix(stats::rpois(nsamp * ngene, lambda =100), nrow = ngene)X <- matrix(rep(c(0,1), length.out = nsamp))B <- matrix(seq(3,0, length.out = ngene))Ynew <- thin_base(mat = Y, designmat = X, coefmat = B)## Demonstrate how the log2 effect size is BBhat <- coefficients(lm(t(log2(Ynew))~ X))["X",]plot(B, Bhat, xlab ="Coefficients", ylab ="Coefficient Estimates")abline(0,1, col =2, lwd =2)
References
Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning." BMC Bioinformatics. 21(1), 206. tools:::Rd_expr_doi("10.1186/s12859-020-3450-9") .
See Also
select_counts: For subsampling the rows and columns of your real RNA-seq count matrix prior to applying binomial thinning.
thin_diff: For the function most users should be using for general-purpose binomial thinning.
thin_2group: For the specific application of thinning in the two-group model.
thin_lib: For the specific application of library size thinning.
thin_gene: For the specific application of total gene expression thinning.
thin_all: For the specific application of thinning all counts uniformly.