Profile likelihood calculation using regression models
Profile likelihood calculation using regression models
This is the function that calculates profileLikelihood for a single SNP. The main function evian calls this function repeatedly to obtain results for multiple SNPs.
snp: a string specifying the SNP of interests to be calculated.
formula_tofit: a formula object of the genetic model. The model should be formatted as y~nuisance parameters. The parameter of interest should not be included here.
model: a string specifying the mode of inheritance parameterization:
additive, dominant, recessive, or overdominance. See details.
data: data frame; read from the argument data in the main function evian. It should contain the SNP ID specified in the snp argument as a column name.
bim: data frame; read from from the argument bim in the main function evian. Provides allele information (base pair, effect/reference alleles) for the SNP of interest.
lolim: numeric; the lower limit for the grid or the minimum value of the regression parameter β used to calculate the likelihood function.
hilim: numeric; the upper limit for the grid or the maximum value of the regression parameter β used to calculate the likelihood funciton.
m: numeric; the density of the grid at which to compute the standardized likelihood function. A beta grid is defined as the grid of values for the SNP parameter used to evaluate the likelihood function.
bse: numeric; the number of beta standard errors to utilize in constraining the beta grid limits. Beta grid is evaluated at β +/- bse*s.e.
k: numeric or numeric vector; The strength of evidence criterion k. Reads from the input of kcutoff from the main evian function
robust: logical; if TRUE, then a robust adjustment is applied to the likelihood function to account for the cluster nature in the data. See robust_forCluster .
family: the link function for glm.
plinkCC: A boolean type that specifies how case/control are coded. case/control were coded 1/0 if it is FALSE, and were coded 2/1 if TRUE.
Details
calculateEvianMLE calculates the profile likelihood for a single SNP. A proper grid range is first established for β then the standardized profile likelihood is evaluated at each of the m cuts uniformly spread across the grid. Based on the standardized profile likelihood, the MLE for β is computed as well as the likelihood intervals for each value of k provided.
For different genetic models, their coding schemes are shown as below:
Additive
AA 0
AB 1
BB 2
Dominant
AA 0
AB 1
BB 1
Recessive
AA 0
AB 0
BB 1
Overdominance model
A D
AA 0 0
AB 1 1
BB 2 0
Specifically for the overdominance model, the column of interest is the D column.
Returns
This function outputs a list containg 4 elements that can be directly accessed using '$' operator. - theta: numeric vector; It stores all mβ values that used to estimate the standardized profile likelihood.
profile.lik.norm: numeric vector; the corresponding m standardized profile likelihood value at each of the β values in theta. If robust=TRUE, then the values will be adjusted by the robust factor.
k_cutoff: numeric vector; It specifies which k-cutoff had been used in the calculation, ordered from the smallest k to the largest k.
SummaryStats: data frame; contains the summary statistics of the profile likelihood calculation. It contains the following columns:
mle: the estimates for SNP effect with respect to the effective allele
maxlr: maximum likelihood ratio in the beta grid defined by lolim and hilim
AF: allele frequency for the effective allele
SNP: SNP ID
bp: base pair position from the bim input
effect, ref: the effective allele and the other allele from the bim input
robustFactor: robust factor calculated, set to 1 if robust=FALSE.
lo_1, hi_1, lo_2, hi_2...: the lower and upper bound of the likelihood intervals for the kth cut-off in k_cutoff.
References
Strug, L. J., Hodge, S. E., Chiang, T., Pal, D. K., Corey, P. N., & Rohde, C. (2010). A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and frequentist analysis. Eur J Hum Genet, 18(8), 933-941. doi:10.1038/ejhg.2010.47
Strug, L. J., & Hodge, S. E. (2006). An alternative foundation for the planning and evaluation of linkage analysis. I. Decoupling "error probabilities" from "measures of evidence". Hum Hered, 61(3), 166-188. doi:10.1159/000094709
Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. London, Chapman and Hall.
Note
When lolim or hilim are NOT defined, then the boundaries of the beta grid will be determined by the default bse=5, or a bse defined by the user. Otherwise, the user can define the exact beta grid boundaries using lolim and hilim.
In some cases the beta grid (using bse or lolim,hilim) may need to be increased substantially (bse as large as 15) if covariates are present in the formula. This is automatically dealt by the current function, but contributes to longer computation time to find the appropriate ranges. Estimation may become inaccurate with large number of correlated covariates, which is a known limitation of profile likelihoods.