Y: The data. A m by n matrix, where m is the number of samples and n is the number of features.
X: The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1. Factors and dataframes are also permissible, and converted to a matrix by design.matrix.
ctl: An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers.
k: The number of unwanted factors to use. Can be 0.
Z: Any additional covariates to include in the model, typically a m by q matrix. Factors and dataframes are also permissible, and converted to a matrix by design.matrix. Alternatively, may simply be 1 (the default) for an intercept term. May also be NULL.
eta: Gene-wise (as opposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. Can be either (1) a matrix with n columns, (2) a matrix with n rows, (3) a dataframe with n rows, (4) a vector or factor of length n, or (5) simply 1, for an intercept term.
include.intercept: Applies to both Z and eta. When Z or eta (or both) is specified (not NULL) but does not already include an intercept term, this will automatically include one. If only one of Z or eta should include an intercept, this variable should be set to FALSE, and the intercept term should be included manually where desired.
fullW: Can be included to speed up execution. Is returned by previous calls of RUV2 (see below).
svdyc: Can be included to speed up execution. For internal use; please use fullW instead.
do_projectionplot: Calculate the quantities necessary to generate a projection plot.
inputcheck: Perform a basic sanity check on the inputs, and issue a warning if there is a problem.
Details
Implements the RUV-2 algorithm as described in Gagnon-Bartsch and Speed (2012), using the SVD as the factor analysis routine. Unwanted factors W are estimated using control genes. Y is then regressed on the variables X, Z, and W.
Returns
A list containing - betahat: The estimated coefficients of the factor(s) of interest. A p by n matrix.
sigma2: Estimates of the features' variances. A vector of length n.
t: t statistics for the factor(s) of interest. A p by n matrix.
p: P-values for the factor(s) of interest. A p by n matrix.
Fstats: F statistics for testing all of the factors in X simultaneously.
Fpvals: P-values for testing all of the factors in X simultaneously.
multiplier: The constant by which sigma2 must be multiplied in order get an estimate of the variance of betahat
df: The number of residual degrees of freedom.
W: The estimated unwanted factors.
alpha: The estimated coefficients of W.
byx: The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.
bwx: The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.
X: X. Included for reference.
k: k. Included for reference.
ctl: ctl. Included for reference.
Z: Z. Included for reference.
eta: eta. Included for reference.
fullW: Can be used to speed up future calls of RUV2.
projectionplotW: A reparameterization of W useful for projection plots.
projectionplotalpha: A reparameterization of alpha useful for projection plots.
include.intercept: include.intercept. Included for reference.
method: Character variable with value "RUV2". Included for reference.