Estimate prevalence with confidence interval accounting for sensitivity and specificity
Estimate prevalence with confidence interval accounting for sensitivity and specificity
Using the method of Lang and Reiczigel (2014), estimate prevalence and get a confidence interval adjusting for the sensitivity and specificity (including accounting for the variability of the sensitivity and specificity estimates).
prevSeSp(AP, nP, Se, nSe, Sp, nSp, conf.level =0.95, neg.to.zero=TRUE)
Arguments
AP: apparent prevalence (proportion positive by test)
nP: number tested for AP
Se: estimated sensitivity (true positive rate)
nSe: number of positive controls used to estimate sensitivity
nSp: number of negative controls used to estimate specificity
conf.level: confidence level
neg.to.zero: logical, should negative prevalence estimates and lower confidence limits be set to zero?
Details
When measuring the prevalence of some disease in a population, it is useful to adjust for the fact that the test for the disease may not be perfect. We adjust the apparent prevalence (the proportion of people tested positive) for the sensitivity (true positive rate: proportion of the population that has the disease that tests positive) and the specificity (1-false positive rate: proportion of the population that do not have the disease that tests negative). So if the true prevalence is θ and the true sensitivity and specificity are Se and Sp, then the expected value of the apparent prevalence is the sum of the expected proportion of true positive results and the expected proportion of false positive results:
AP=θSe+(1−Sp)(1−θ).
Plugging in the estimates (and using the same notation for the estimates as the true values) and solving for θ we get the estimate of prevalence of
θ=Se−(1−Sp)AP−(1−Sp).
Lang and Reiczigel (2014) developed an approximate confidence interval for the prevalence that not only adjusts for the sensitivity and specificity, but also adjusts for the fact that the sensitivity is estimated from a sample of true positive individuals (nSe) and the specificity is estimate from a sample of true negative individuals (nSp).
If the estimated false positive rate (1-specificity) is larger than the apparent prevalence, the prevalence estimate will be negative. This occurs because we observe a smaller proportion of positive results than we would expect from a population known not to have the disease. The lower confidence limit can also be negative because of the variability in the specificity estimate. The default with neg.to.zero=TRUE sets those negative estimates and lower confidence limits to zero.
The Lang-Reiczigel method uses an idea discussed in Agresti and Coull (1998) to get approximate confidence intervals. For 95% confidence intervals, the idea is similar to adding 2 positive and 2 negative individuals to the apparent prevalence results, and adding 1 positive and 1 negative individual to the sensitivity and specificity test results, then using asymptotic normality. Simulations in Lang and Reiczigel (2014) show the method works well for true sensitivity and specificity each in ranges from 70% to over 90%.
Returns
A list with class "htest" containing the following components:
estimate: the adjusted prevalence estimate, adjusted for sensitivity and specificity
statistic: the estimated sensitivity given by Se
parameter: the estimated specificity given by Sp
conf.int: a confidence interval for the prevalence.
method: the character string describing the output.
data.name: a character string giving the unadjusted prevalence value and the sample size used to estimate it (nP).
References
Agresti, A., Coull, B.A., 1998. Approximate is better than 'exact'for interval estimation of binomial proportions. Am. Stat. 52,119-126.
Lang, Z. and Reiczigel, J., 2014. Confidence limits for prevalence of disease adjusted for estimated sensitivity and specificity. Preventive veterinary medicine, 113(1), pp.13-22.
Author(s)
Michael P. Fay
Note
There is a typo in equation 4 of Lang and Reiczigel (2014), the (1+P^)2 should be (1−P^)2.
See Also
truePrev in package list("prevalence") for Bayesian methods for this problem (but this requires JAGS (Just Another Gibbs Sampler), a separate software that can be called from R if it is installed on the user's system.)
Examples
# Example 1 of Lang and Reiczigel, 2014# 95% CI should be 0.349, 0.372prevSeSp(AP=4060/11284,nP=11284,Se=178/179,nSe=179,Sp=358/359, nSp=359)# Example 2 of Lang and Reiczigel, 2014# 95% CI should be 0, 0.053prevSeSp(AP=51/2971,nP=2971,Se=32/33,nSe=33,Sp=20/20, nSp=20)# Example 3 of Lang and Reiczigel, 2014# 95% CI should be 0 and 0.147prevSeSp(AP=0.06,nP=11862,Se=0.80,nSe=10,Sp=1, nSp=12)# Example 4 of Lang and Reiczigel, 2014# 95% CI should be 0.58 to 0.87prevSeSp(AP=259/509,nP=509,Se=84/127,nSe=127,Sp=96/109, nSp=109)# 95% CI should be 0.037 to 0.195prevSeSp(AP=51/509,nP=509,Se=23/41,nSe=41,Sp=187/195, nSp=195)