RUVinv() R function from [ruv]

Remove Unwanted Variation, inverse method

The RUV-inv algorithm. Estimates and adjusts for unwanted variation using negative controls.


RUVinv(Y, X, ctl, Z=1, eta=NULL, include.intercept=TRUE,
       fullW0=NULL, invsvd=NULL, lambda=NULL,
       randomization=FALSE, iterN=100000, inputcheck=TRUE)

Arguments

Y: The data. A m by n matrix, where m is the number of samples and n is the number of features.
X: The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1. Factors and dataframes are also permissible, and converted to a matrix by design.matrix.
ctl: An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers.
Z: Any additional covariates to include in the model, typically a m by q matrix. Factors and dataframes are also permissible, and converted to a matrix by design.matrix. Alternatively, may simply be 1 (the default) for an intercept term. May also be NULL.
eta: Gene-wise (as opposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. Can be either (1) a matrix with n columns, (2) a matrix with n rows, (3) a dataframe with n rows, (4) a vector or factor of length n, or (5) simply 1, for an intercept term.
include.intercept: Applies to both Z and eta. When Z or eta (or both) is specified (not NULL) but does not already include an intercept term, this will automatically include one. If only one of Z or eta should include an intercept, this variable should be set to FALSE, and the intercept term should be included manually where desired.
fullW0: Can be included to speed up execution. Is returned by previous calls of RUV4, RUVinv, or RUVrinv (see below).
invsvd: Can be included to speed up execution. Generally used when calling RUV(r)inv many times with different values of lambda. Is returned by previous calls of RUV(r)inv (see below).
lambda: Ridge parameter. If specified, the ridged inverse method will be used.
randomization: Whether the inverse-method variances should be computed using randomly generated factors of interest (as opposed to a numerical integral).
iterN: The number of random "factors of interest" to generate (used only when randomization=TRUE).
inputcheck: Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

Details

Implements the RUV-inv algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013).

Returns

A list containing - betahat: The estimated coefficients of the factor(s) of interest. A p by n matrix.

sigma2: Estimates of the features' variances. A vector of length n.
t: t statistics for the factor(s) of interest. A p by n matrix.
p: P-values for the factor(s) of interest. A p by n matrix.
Fstats: F statistics for testing all of the factors in X simultaneously.
Fpvals: P-values for testing all of the factors in X simultaneously.
multiplier: The constant by which sigma2 must be multiplied in order get an estimate of the variance of betahat
df: The number of residual degrees of freedom.
W: The estimated unwanted factors.
alpha: The estimated coefficients of W.
byx: The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.
bwx: The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.
X: X. Included for reference.
k: k. Included for reference.
ctl: ctl. Included for reference.
Z: Z. Included for reference.
eta: eta. Included for reference.
fullW0: Can be used to speed up future calls of RUV4.
lambda: lambda. Included for reference.
invsvd: Can be used to speed up future calls of RUV(r)inv.
include.intercept: include.intercept. Included for reference.
method: Character variable with value "RUVinv". Included for reference.

Note

Additional resources can be found at http://www-personal.umich.edu/~johanngb/ruv/.

References

Using control genes to correct for unwanted variation in microarray data. Gagnon-Bartsch and Speed, 2012. Available at: http://biostatistics.oxfordjournals.org/content/13/3/539.full.

Removing Unwanted Variation from High Dimensional Data with Negative Controls. Gagnon-Bartsch, Jacob, and Speed, 2013. Available at: http://statistics.berkeley.edu/tech-reports/820.

Author(s)

Johann Gagnon-Bartsch johanngb@umich.edu

Examples


## Create some simulated data
m = 50
n = 10000
nc = 1000
p = 1
k = 20
ctl = rep(FALSE, n)
ctl[1:nc] = TRUE
X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p)
beta = matrix(rnorm(p*n), p, n)
beta[,ctl] = 0
W = matrix(rnorm(m*k),m,k)
alpha = matrix(rnorm(k*n),k,n)
epsilon = matrix(rnorm(m*n),m,n)
Y = X%*%beta + W%*%alpha + epsilon

## Run RUV-inv
fit = RUVinv(Y, X, ctl)

## Get adjusted variances and p-values
fit = variance_adjust(fit)

RUVinv function