p_linreg_yerrors function

Linear Regression of Y vs. Covariates with Y Measured in Pools and (Potentially) Subject to Additive Normal Errors

Linear Regression of Y vs. Covariates with Y Measured in Pools and (Potentially) Subject to Additive Normal Errors

Assumes outcome given covariates is a normal-errors linear regression. Pooled outcome measurements can be assumed precise or subject to additive normal processing error and/or measurement error. Replicates are supported.

p_linreg_yerrors(g, ytilde, x = NULL, errors = "processing", estimate_var = TRUE, start_nonvar_var = c(0.01, 1), lower_nonvar_var = c(-Inf, 1e-04), upper_nonvar_var = c(Inf, Inf), nlminb_list = list(control = list(trace = 1, eval.max = 500, iter.max = 500)), hessian_list = list(method.args = list(r = 4)))

Arguments

  • g: Numeric vector with pool sizes, i.e. number of members in each pool.

  • ytilde: Numeric vector (or list of numeric vectors, if some pools have replicates) with poolwise sum Ytilde values.

  • x: Numeric matrix with poolwise ‘X’ values (if any), with one row for each pool. Can be a vector if there is only 1 covariate.

  • errors: Character string specifying the errors that Y is subject to. Choices are "neither", "processing" for processing error only, "measurement" for measurement error only, and "both".

  • estimate_var: Logical value for whether to return variance-covariance matrix for parameter estimates.

  • start_nonvar_var: Numeric vector of length 2 specifying starting value for non-variance terms and variance terms, respectively.

  • lower_nonvar_var: Numeric vector of length 2 specifying lower bound for non-variance terms and variance terms, respectively.

  • upper_nonvar_var: Numeric vector of length 2 specifying upper bound for non-variance terms and variance terms, respectively.

  • nlminb_list: List of arguments to pass to nlminb

    for log-likelihood maximization.

  • hessian_list: List of arguments to pass to hessian.

Returns

List containing:

  1. Numeric vector of parameter estimates.
  2. Variance-covariance matrix (if estimate_var = TRUE).
  3. Returned nlminb object from maximizing the log-likelihood function.
  4. Akaike information criterion (AIC).

Details

The individual-level model of interest for Y|X is:

Y = beta_0 + beta_x ^T X + e, e ~ N(0, sigsq)

The implied model for summed Y*|X* in a pool with g members is:

Y* = g beta_0 + beta_x ^T X* + e*, e* ~ N(0, g sigsq)

The assay targets Ybar, the mean Y value for each pool, from which the sum Y* can be calculated as Y* = g Ybar. But the Ybar's may be subject to processing error and/or measurement error. Suppose Ybartilde is the imprecise version of Ybar from the assay. If both errors are present, the assumed error structure is:

Ybartilde = Ybar + e_p I(g > 1) + e_m, e_p ~ N(0, sigsq_p), e_m ~ N(0, sigsq_m)

with the processing error e_p and measurement error e_m assumed independent of each other. This motivates a maximum likelihood analysis for estimating theta = (beta_0, beta_x ^T)^T based on observed (Ytilde*, X ) values, where Ytilde = g Ytildebar.

Examples

# Load dataset containing data frame with (g, X1*, X2*, Y*, Ytilde*) values # for 500 pools each of size 1, 2, and 3, and list of Ytilde values where 20 # of the single-specimen pools have replicates. Ytilde values are affected by # processing error and measurement error; true parameter values are # beta_0 = 0.25, beta_x1 = 0.5, beta_x2 = 0.25, sigsq = 1. data(dat_p_linreg_yerrors) dat <- dat_p_linreg_yerrors$dat reps <- dat_p_linreg_yerrors$reps # Fit Ytilde* vs. (X1*, X2*) ignoring errors in Ytilde (leads to loss of # precision and overestimated sigsq, but no bias). fit.naive <- p_linreg_yerrors( g = dat$g, y = dat$y, x = dat[, c("x1", "x2")], errors = "neither" ) fit.naive$theta.hat # Account for errors in Ytilde*, without using replicates fit.corrected.noreps <- p_linreg_yerrors( g = dat$g, y = dat$ytilde, x = dat[, c("x1", "x2")], errors = "both" ) fit.corrected.noreps$theta.hat # Account for errors in Ytilde*, incorporating the 20 replicates fit.corrected.reps <- p_linreg_yerrors( g = dat$g, y = reps, x = dat[, c("x1", "x2")], errors = "both" ) fit.corrected.reps$theta.hat # In this trial, incorporating replicates resulted in much better estimates # of sigsq (truly 1), sigsq_p (truly 0.4), and sigsq_m (truly = 0.2) but very # similar regression coefficient estimates. fit.corrected.noreps$theta.hat fit.corrected.reps$theta.hat

References

Schisterman, E.F., Vexler, A., Mumford, S.L. and Perkins, N.J. (2010) "Hybrid pooled-unpooled design for cost-efficient measurement of biomarkers." Stat. Med. 29 (5): 597--613.

  • Maintainer: Dane R. Van Domelen
  • License: GPL-3
  • Last published: 2020-02-13

Useful links