RUVinv function

Remove Unwanted Variation, inverse method

Remove Unwanted Variation, inverse method

The RUV-inv algorithm. Estimates and adjusts for unwanted variation using negative controls.

RUVinv(Y, X, ctl, Z=1, eta=NULL, include.intercept=TRUE, fullW0=NULL, invsvd=NULL, lambda=NULL, randomization=FALSE, iterN=100000, inputcheck=TRUE)

Arguments

  • Y: The data. A m by n matrix, where m is the number of samples and n is the number of features.
  • X: The factor(s) of interest. A m by p matrix, where m is the number of samples and p is the number of factors of interest. Very often p = 1. Factors and dataframes are also permissible, and converted to a matrix by design.matrix.
  • ctl: An index vector to specify the negative controls. Either a logical vector of length n or a vector of integers.
  • Z: Any additional covariates to include in the model, typically a m by q matrix. Factors and dataframes are also permissible, and converted to a matrix by design.matrix. Alternatively, may simply be 1 (the default) for an intercept term. May also be NULL.
  • eta: Gene-wise (as opposed to sample-wise) covariates. These covariates are adjusted for by RUV-1 before any further analysis proceeds. Can be either (1) a matrix with n columns, (2) a matrix with n rows, (3) a dataframe with n rows, (4) a vector or factor of length n, or (5) simply 1, for an intercept term.
  • include.intercept: Applies to both Z and eta. When Z or eta (or both) is specified (not NULL) but does not already include an intercept term, this will automatically include one. If only one of Z or eta should include an intercept, this variable should be set to FALSE, and the intercept term should be included manually where desired.
  • fullW0: Can be included to speed up execution. Is returned by previous calls of RUV4, RUVinv, or RUVrinv (see below).
  • invsvd: Can be included to speed up execution. Generally used when calling RUV(r)inv many times with different values of lambda. Is returned by previous calls of RUV(r)inv (see below).
  • lambda: Ridge parameter. If specified, the ridged inverse method will be used.
  • randomization: Whether the inverse-method variances should be computed using randomly generated factors of interest (as opposed to a numerical integral).
  • iterN: The number of random "factors of interest" to generate (used only when randomization=TRUE).
  • inputcheck: Perform a basic sanity check on the inputs, and issue a warning if there is a problem.

Details

Implements the RUV-inv algorithm as described in Gagnon-Bartsch, Jacob, and Speed (2013).

Returns

A list containing - betahat: The estimated coefficients of the factor(s) of interest. A p by n matrix.

  • sigma2: Estimates of the features' variances. A vector of length n.

  • t: t statistics for the factor(s) of interest. A p by n matrix.

  • p: P-values for the factor(s) of interest. A p by n matrix.

  • Fstats: F statistics for testing all of the factors in X simultaneously.

  • Fpvals: P-values for testing all of the factors in X simultaneously.

  • multiplier: The constant by which sigma2 must be multiplied in order get an estimate of the variance of betahat

  • df: The number of residual degrees of freedom.

  • W: The estimated unwanted factors.

  • alpha: The estimated coefficients of W.

  • byx: The coefficients in a regression of Y on X (after both Y and X have been "adjusted" for Z). Useful for projection plots.

  • bwx: The coefficients in a regression of W on X (after X has been "adjusted" for Z). Useful for projection plots.

  • X: X. Included for reference.

  • k: k. Included for reference.

  • ctl: ctl. Included for reference.

  • Z: Z. Included for reference.

  • eta: eta. Included for reference.

  • fullW0: Can be used to speed up future calls of RUV4.

  • lambda: lambda. Included for reference.

  • invsvd: Can be used to speed up future calls of RUV(r)inv.

  • include.intercept: include.intercept. Included for reference.

  • method: Character variable with value "RUVinv". Included for reference.

Note

Additional resources can be found at http://www-personal.umich.edu/~johanngb/ruv/.

References

Using control genes to correct for unwanted variation in microarray data. Gagnon-Bartsch and Speed, 2012. Available at: http://biostatistics.oxfordjournals.org/content/13/3/539.full.

Removing Unwanted Variation from High Dimensional Data with Negative Controls. Gagnon-Bartsch, Jacob, and Speed, 2013. Available at: http://statistics.berkeley.edu/tech-reports/820.

Author(s)

Johann Gagnon-Bartsch johanngb@umich.edu

Examples

## Create some simulated data m = 50 n = 10000 nc = 1000 p = 1 k = 20 ctl = rep(FALSE, n) ctl[1:nc] = TRUE X = matrix(c(rep(0,floor(m/2)), rep(1,ceiling(m/2))), m, p) beta = matrix(rnorm(p*n), p, n) beta[,ctl] = 0 W = matrix(rnorm(m*k),m,k) alpha = matrix(rnorm(k*n),k,n) epsilon = matrix(rnorm(m*n),m,n) Y = X%*%beta + W%*%alpha + epsilon ## Run RUV-inv fit = RUVinv(Y, X, ctl) ## Get adjusted variances and p-values fit = variance_adjust(fit)

See Also

RUV2, RUV4, RUVrinv, variance_adjust, invvar