jointMeanCovModSelCen function

Estimate Mean and Row-Row Correlation Matrix Using Model Selection

Estimate Mean and Row-Row Correlation Matrix Using Model Selection

This function implements Algorithm 2 from Hornstein, Fan, Shedden, and Zhou (2018), doi: 10.1080/01621459.2018.1429275. Given an n by m data matrix, with a vector of indices denoting group membership, this function estimates mean and covariance structure as follows. 1. Run Algorithm 1 (jointMeanCovGroupCen). 2. Use a threshold to select genes with the largest mean differences. 3. Group center the genes with mean differences above the threshold, and globally center the remaining genes. Use the centered data matrix to calculate a Gram matrix as input to Gemini. 4. Use Gemini to estimate the inverse row covariance matrix, and use the inverse row covariance matrix with GLS to estimate group means. 5. Calculate test statistics comparing group means for each column.

jointMeanCovModSelCen(X, group.one.indices, rowpen, B.inv = NULL, rowpen.ModSel = NULL, thresh = NULL)

Arguments

  • X: Data matrix.
  • group.one.indices: Vector of indices denoting rows in group one.
  • rowpen: Glasso penalty for estimating B, the row-row correlation matrix.
  • B.inv: Optional row-row covariance matrix to be used in GLS in Algorithm 1 prior to model selection centering. If this argument is passed, then it is used instead of estimating the inverse row-row covariance.
  • rowpen.ModSel: Optional Glasso penalty for estimating B in the second step.
  • thresh: Threshold for model selection centering. If group means for a column differ by less than the threshold, the column is globally centered rather than group centered. If thresh is NULL, then the theoretically guided threshold is used.

Returns

  • B.hat.inv: Estimated row-row inverse covariance matrix. For identifiability, A and B are scaled so that A has trace m, where m is the number of columns of X.

  • corr.B.hat.inv: Estimated row-row inverse correlation matrix.

  • gls.group.means: Matrix with two rows and m columns, where m is the number of columns of X. Entry (i, j) contains the estimated mean of the jth column for an individual in group i, with i = 1,2, and j = 1, ..., m.

  • gamma.hat: Estimated group mean differences.

  • test.stats: Vector of test statistics of length m.

  • p.vals: Vector of two-sided p-values, calculated using the standard normal distribution.

  • p.vals.adjusted: Vector of p-values, adjusted using the Benjamini-Hochberg fdr adjustment.

Examples

# Define sample sizes n1 <- 5 n2 <- 5 n <- n1 + n2 m <- 200 # Generate data with row and column covariance # matrices each autorogressive of order 1 with # parameter 0.2. The mean is defined so the first # three columns have true differences in group means # equal to four. Z <- matrix(rnorm(m * n), nrow=n, ncol=m) A <- outer(1:m, 1:m, function(i, j) 0.2^abs(i - j)) B <- outer(1:n, 1:n, function(i, j) 0.2^abs(i - j)) M <- matrix(0, nrow=nrow(Z), ncol=ncol(Z)) group.one.indices <- 1:5 group.two.indices <- 6:10 M[group.one.indices, 1:3] <- 2 M[group.two.indices, 1:3] <- -2 X <- t(chol(B)) %*% Z %*% chol(A) + M # Apply Algorithm 2 (jointMeanCovModSelCen) and # plot the test statistics. rowpen <- sqrt(log(m) / n) out <- jointMeanCovModSelCen(X, group.one.indices, rowpen) plot(out) summary(out)
  • Maintainer: Michael Hornstein
  • License: GPL-2
  • Last published: 2019-05-04

Useful links