marker_h2 function

Compute a marker-based estimate of heritability, given phenotypic observations at individual plant or plot level.

Compute a marker-based estimate of heritability, given phenotypic observations at individual plant or plot level.

Given a genetic relatedness matrix and phenotypic observations at individual plant or plot level, this function computes REML-estimates of the genetic and residual variance and their standard errors, using the AI-algorithm (Gilmour et al. 1995). Based on this, heritability estimates and confidence intervals are given (the estimator hr2h_r^2 in Kruijer et al.).

marker_h2(data.vector, geno.vector, covariates = NULL, K, alpha = 0.05, eps = 1e-06, max.iter = 100, fix.h2 = FALSE, h2 = 0.5)

Arguments

  • data.vector: A vector of phenotypic observations. Needs to be of type numeric. May contain missing values.
  • geno.vector: A vector of genotype labels, either a factor or character. This vector should correspond to data.vector, and hence needs to be of the same length.
  • covariates: A data-frame or matrix with optional covariates, the rows corresponding to the phenotypic observations in data.vector and geno.vector. May contain missing values. Factors are not allowed, and need to be encoded by columns of type numeric or integer. The data-frame or matrix should not contain an intercept, which is included by default.
  • K: A genetic relatedness or kinship matrix, typically marker-based. Must have row- and column-names corresponding to the levels of geno.vector
  • alpha: Confidence level, for the 1-alpha confidence intervals.
  • eps: Numerical precision, used as convergence criterion in the AI-algorithm.
  • max.iter: Maximal number of iterations in the AI-algorithm.
  • fix.h2: Compute the log-likelihood and inverse AI-matrix for a fixed heritability value. Default is FALSE.
  • h2: When fix.h2 is TRUE, the value of the heritability. Must be of type numeric, between 0 and 1.

Details

  • Given phenotypic observations YijY_{ij} for genotypes i=1,...,ni=1,...,n and replicates j=1,...,nij = 1,...,n_i, the mixed model Yij=μ+Gi+EijY_{ij} = \mu + G_i + E_{ij}

    is assumed. The vector of additive genetic effects (G1,...,Gn)(G_1,...,G_n)' follows a multivariate normal distribution with mean zero and covariance σA2K\sigma_A^2 K, where σA2\sigma_A^2 is the additive genetic variance, and KK is a genetic relatedness matrix derived from a dense set of markers. The errors EijE_{ij} are independent and normally distributed with variance σE2\sigma_E^2. Under certain assumptions (see Speed et al. 2012) the marker- or chip-heritability h2=σA2/(σA2+σE2)h^2 = \sigma_A^2 / (\sigma_A^2 + \sigma_E^2)

    equals the narrow-sense heritability.

  • It is assumed that the genetic relatedness matrix KK is scaled such that trace(PKP)=n1trace(P K P) = n - 1, where PP is the projection matrix In1n1n/nI_n - 1_n 1_n' / n, for the identity matrix InI_n and 1n1_n being a column vector of ones. If this is not the case, KK is automatically scaled prior to fitting the mixed model.

  • The model can optionally include a term XijβX_{ij} \beta, where XijX_{ij}

    is the row vector with observations on kk extra covariates and the vector β\beta contains their effects. In this case the argument covariates should be the (N x k) matrix or data-frame with rows XijX_{ij} (N being the total number of observations). Observations where either YijY_{ij} or any of the covariates is missing are discarded.

  • Confidence intervals for heritability are constructed using the delta-method and the inverse AI-matrix. The delta-method can be applied either directly to the function (σA2,σE2)σA2/(σA2+σE2)(\sigma_A^2,\sigma_E^2) -\> \sigma_A^2 / (\sigma_A^2 + \sigma_E^2) or to the function (σA2,σE2)log(σA2/σE2)(\sigma_A^2,\sigma_E^2) -\> log(\sigma_A^2 / \sigma_E^2). In the latter case, a confidence interval for log(σA2/σE2)log(\sigma_A^2 / \sigma_E^2) is obtained, which is back-transformed to a confidence interval for heritability. This approach (proposed in Kruijer et al.) has the advantage that intervals are always contained in the unit interval.

  • The AI-algorithm is run for max.iter iterations. If by then there is no convergence a warning is printed and the current estimates are returned.

Returns

A list with the following components:

  • va: REML-estimate of the (additive) genetic variance.
  • ve: REML-estimate of the residual variance.
  • h2: Plug-in estimate of heritability: va/(va+ve)va / (va + ve).
  • conf.int1: 1-alpha confidence interval for heritability.
  • conf.int2: 1-alpha confidence interval for heritability, obtained by application of the delta method on a logarithmic scale.
  • inv.ai: The inverse of the average information (AI) matrix.
  • loglik: The log-likelihood.

References

  • Gilmour et al. Gilmour, A.R., R. Thompson and B.R. Cullis (1995) Average Information REML: An Efficient Algorithm for Variance Parameter Estimation in Linear Mixed Models. Biometrics, volume 51, number 4, 1440-1450.
  • Kruijer, W. et al. (2015) Marker-based estimation of heritability in immortal populations. Genetics, Vol. 199(2), p. 1-20.
  • Speed, D., G. Hemani, M. R. Johnson, and D.J. Balding (2012) Improved heritability estimation from genome-wide snps. the American journal of human genetics 91: 1011-1021.

Author(s)

Willem Kruijer.

See Also

For marker-based estimation of heritability using genotypic means, see marker_h2_means.

Examples

data(LD) data(K_atwell) # Heritability estimation for all observations: #out <- marker_h2(data.vector=LD$LD,geno.vector=LD$genotype, # covariates=LD[,4:8],K=K_atwell) # Heritability estimation for a randomly chosen subset of 20 accessions: set.seed(123) sub.set <- which(LD$genotype %in% sample(levels(LD$genotype),20)) out <- marker_h2(data.vector=LD$LD[sub.set],geno.vector=LD$genotype[sub.set], covariates=LD[sub.set,4:8],K=K_atwell)
  • Maintainer: Willem Kruijer
  • License: GPL-3
  • Last published: 2023-08-24

Useful links