wschisq function

Weighted sums of non-central chi squared random variables

Weighted sums of non-central chi squared random variables

Approximated density, distribution, and quantile functions for weighted sums of non-central chi squared random variables: [REMOVE_ME]QK=i=1Kwiχdi2(λi),[REMOVEME2] Q_K = \sum_{i = 1}^K w_i \chi^2_{d_i}(\lambda_i), [REMOVE_ME_2]

where w1,,wnw_1, \ldots, w_n are positive weights, d1,,dnd_1, \ldots, d_n

are positive degrees of freedom, and λ1,,λn\lambda_1, \ldots, \lambda_n

are non-negative non-centrality parameters. Also, simulation of QKQ_K.

d_wschisq(x, weights, dfs, ncps = 0, method = c("I", "SW", "HBE")[1], exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06, imhof_limit = 10000, grad_method = "simple", grad_method.args = list(eps = 1e-07)) p_wschisq(x, weights, dfs, ncps = 0, method = c("I", "SW", "HBE", "MC")[1], exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06, imhof_limit = 10000, M = 10000, MC_sample = NULL) q_wschisq(u, weights, dfs, ncps = 0, method = c("I", "SW", "HBE", "MC")[1], exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06, imhof_limit = 10000, nlm_gradtol = 1e-06, nlm_iterlim = 1000, M = 10000, MC_sample = NULL) r_wschisq(n, weights, dfs, ncps = 0) cutoff_wschisq(thre = 1e-04, weights, dfs, ncps = 0, log = FALSE, x_tail = NULL)

Arguments

  • x: vector of quantiles.

  • weights: vector with the positive weights of the sum. Must have the same length as dfs.

  • dfs: vector with the positive degrees of freedom of the chi squared random variables. Must have the same length as weights.

  • ncps: non-centrality parameters. Either 0 (default) or a vector with the same length as weights.

  • method: method for approximating the density, distribution, or quantile function. Must be "I" (Imhof), "SW"

    (Satterthwaite--Welch), "HBE" (Hall--Buckley--Eagleson), or "MC" (Monte Carlo; only for distribution or quantile functions). Defaults to "I".

  • exact_chisq: if weights and dfs have length one, shall the Chisquare functions be called? Otherwise, the approximations are computed for this exact case. Defaults to TRUE.

  • imhof_epsabs, imhof_epsrel, imhof_limit: precision parameters passed to imhof's epsabs, epsrel, and limit, respectively. They default to 1e-6, 1e-6, and 1e4.

  • grad_method, grad_method.args: numerical differentiation parameters passed to grad's method and method.args, respectively. They default to "simple", and list(eps = 1e-7) (better precision than imhof_epsabs to avoid numerical artifacts).

  • M: number of Monte Carlo samples for approximating the distribution if method = "MC". Defaults to 1e4.

  • MC_sample: if provided, it is employed when method = "MC". If not, it is computed internally.

  • u: vector of probabilities.

  • nlm_gradtol, nlm_iterlim: convergence control parameters passed to nlm's gradtol and iterlim, respectively. They default to 1e-6 and 1e3.

  • n: sample size.

  • thre: vector with the error thresholds of the tail probability and mean/variance explained by the first terms of the series. Defaults to 1e-4. See details.

  • log: are weights and dfs given in log-scale? Defaults to FALSE.

  • x_tail: scalar evaluation point for determining the upper tail probability. If NULL, set to the 0.90 quantile of the whole series, computed by the "HBE" approximation.

Returns

  • d_wschisq: density function evaluated at x, a vector.
  • p_wschisq: distribution function evaluated at x, a vector.
  • q_wschisq: quantile function evaluated at u, a vector.
  • r_wschisq: a vector of size n containing a random sample.
  • cutoff_wschisq: a data frame with the indexes up to which the truncated series explains the tail probability with absolute error thre, or the proportion of the mean/variance of the whole series that is not explained by the truncated series.

Description

Approximated density, distribution, and quantile functions for weighted sums of non-central chi squared random variables:

QK=i=1Kwiχdi2(λi), Q_K = \sum_{i = 1}^K w_i \chi^2_{d_i}(\lambda_i),

where w1,,wnw_1, \ldots, w_n are positive weights, d1,,dnd_1, \ldots, d_n

are positive degrees of freedom, and λ1,,λn\lambda_1, \ldots, \lambda_n

are non-negative non-centrality parameters. Also, simulation of QKQ_K.

Details

Four methods are implemented for approximating the distribution of a weighted sum of chi squared random variables:

  • "I": Imhof's approximation (Imhof, 1961) for the evaluation of the distribution function. If this method is selected, the function is simply a wrapper to imhof from the CompQuadForm package (Duchesne and Lafaye De Micheaux, 2010).
  • "SW": Satterthwaite--Welch (Satterthwaite, 1946; Welch, 1938) approximation, consisting in matching the first two moments of QKQ_K with a gamma distribution.
  • "HBE": Hall--Buckley--Eagleson (Hall, 1983; Buckley and Eagleson, 1988) approximation, consisting in matching the first three moments of QKQ_K with a gamma distribution.
  • "MC": Monte Carlo approximation using the empirical cumulative distribution function with M simulated samples.

The Imhof method is exact up to the prescribed numerical accuracy. It is also the most time-consuming method. The density and quantile functions for this approximation are obtained by numerical differentiation and inversion, respectively, of the approximated distribution.

For the methods based on gamma matching, the GammaDist

density, distribution, and quantile functions are invoked. The Hall--Buckley--Eagleson approximation tends to overperform the Satterthwaite--Welch approximation.

The Monte Carlo method is relatively inaccurate and slow, but serves as an unbiased reference of the true distribution function. The inversion of the empirical cumulative distribution is done by quantile.

An empirical comparison of these and other approximation methods is given in Bodenham and Adams (2016).

cutoff_wschisq removes NAs/NaNs in weights or dfs with a message. The threshold thre ensures that the tail probability of the truncated and whole series differ less than thre at x_tail, or that thre is the proportion of the mean/variance of the whole series that is not retained. The (upper) tail probabilities for evaluating truncation are computed using the Hall--Buckley--Eagleson approximation at x_tail.

Examples

# Plotting functions for the examples add_approx_dens <- function(x, dfs, weights, ncps) { lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps, method = "SW", exact_chisq = FALSE), col = 3) lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps, method = "HBE", exact_chisq = FALSE), col = 4) lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps, method = "I", exact_chisq = TRUE), col = 2) legend("topright", legend = c("True", "SW", "HBE", "I"), lwd = 2, col = c(1, 3:4, 2)) } add_approx_distr <- function(x, dfs, weights, ncps, ...) { lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps, method = "SW", exact_chisq = FALSE), col = 3) lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps, method = "HBE", exact_chisq = FALSE), col = 4) lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps, method = "MC", exact_chisq = FALSE), col = 5, type = "s") lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps, method = "I", exact_chisq = TRUE), col = 2) legend("bottomright", legend = c("True", "SW", "HBE", "MC", "I"), lwd = 2, col = c(1, 3:5, 2)) } add_approx_quant <- function(u, dfs, weights, ncps, ...) { lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps, method = "SW", exact_chisq = FALSE), col = 3) lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps, method = "HBE", exact_chisq = FALSE), col = 4) lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps, method = "MC", exact_chisq = FALSE), col = 5, type = "s") lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps, method = "I", exact_chisq = TRUE), col = 2) legend("topleft", legend = c("True", "SW", "HBE", "MC", "I"), lwd = 2, col = c(1, 3:5, 2)) } # Validation plots for density, distribution, and quantile functions u <- seq(0.01, 0.99, l = 100) old_par <- par(mfrow = c(1, 3)) # Case 1: 1 * ChiSq_3(0) + 1 * ChiSq_3(0) = ChiSq_6(0) weights <- c(1, 1) dfs <- c(3, 3) ncps <- 0 x <- seq(-1, 30, l = 100) main <- expression(1 * chi[3]^2 * (0) + 1 * chi[3]^2 * (0)) plot(x, dchisq(x, df = 6), type = "l", main = main, ylab = "Density") add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps) plot(x, pchisq(x, df = 6), type = "l", main = main, ylab = "Distribution") add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps) plot(u, qchisq(u, df = 6), type = "l", main = main, ylab = "Quantile") add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps) # Case 2: 2 * ChiSq_3(1) + 1 * ChiSq_6(0.5) + 0.5 * ChiSq_12(0.25) weights <- c(2, 1, 0.5) dfs <- c(3, 6, 12) ncps <- c(1, 0.5, 0.25) x <- seq(0, 70, l = 100) main <- expression(2 * chi[3]^2 * (1)+ 1 * chi[6]^2 * (0.5) + 0.5 * chi[12]^2 * (0.25)) samp <- r_wschisq(n = 1e4, weights = weights, dfs = dfs, ncps = ncps) hist(samp, breaks = 50, freq = FALSE, main = main, ylab = "Density", xlim = range(x), xlab = "x"); box() add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps) plot(x, ecdf(samp)(x), main = main, ylab = "Distribution", type = "s") add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps) plot(u, quantile(samp, probs = u), type = "s", main = main, ylab = "Quantile") add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps) # Case 3: \sum_{k = 1}^K k^(-3) * ChiSq_{5k}(1 / k^2) K <- 1e2 weights<- 1 / (1:K)^3 dfs <- 5 * 1:K ncps <- 1 / (1:K)^2 x <- seq(0, 25, l = 100) main <- substitute(sum(k^(-3) * chi[5 * k]^2 * (1 / k^2), k == 1, K), list(K = K)) samp <- r_wschisq(n = 1e4, weights = weights, dfs = dfs, ncps = ncps) hist(samp, breaks = 50, freq = FALSE, main = main, ylab = "Density", xlim = range(x), xlab = "x"); box() add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps) plot(x, ecdf(samp)(x), main = main, ylab = "Distribution", type = "s") add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps) plot(u, quantile(samp, probs = u), type = "s", main = main, ylab = "Quantile") add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps) par(old_par) # Cutoffs for infinite series of the last example K <- 1e7 log_weights<- -3 * log(1:K) log_dfs <- log(5) + log(1:K) (cutoff <- cutoff_wschisq(thre = 10^(-(1:4)), weights = log_weights, dfs = log_dfs, log = TRUE)) # Approximation x <- seq(0, 25, l = 100) l <- length(cutoff$mean) main <- expression(sum(k^(-3) * chi[5 * k]^2, k == 1, K)) col <- viridisLite::viridis(l) plot(x, d_wschisq(x, weights = exp(log_weights[1:cutoff$mean[l]]), dfs = exp(log_dfs[1:cutoff$mean[l]])), type = "l", ylab = "Density", col = col[l], lwd = 3) for(i in rev(seq_along(cutoff$mean)[-l])) { lines(x, d_wschisq(x, weights = exp(log_weights[1:cutoff$mean[i]]), dfs = exp(log_dfs[1:cutoff$mean[i]])), col = col[i]) } legend("topright", legend = paste0(rownames(cutoff), " (", cutoff$mean, ")"), lwd = 2, col = col)

References

Bodenham, D. A. and Adams, N. M. (2016). A comparison of efficient approximations for a weighted sum of chi-squared random variables. Statistics and Computing, 26(4):917--928. tools:::Rd_expr_doi("10.1007/s11222-015-9583-4")

Buckley, M. J. and Eagleson, G. K. (1988). An approximation to the distribution of quadratic forms in normal random variables. Australian Journal of Statistics, 30(1):150--159. tools:::Rd_expr_doi("10.1111/j.1467-842X.1988.tb00471.x")

Duchesne, P. and Lafaye De Micheaux, P. (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu--Tang--Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4):858--862. tools:::Rd_expr_doi("10.1016/j.csda.2009.11.025")

Hall, P. (1983). Chi squared approximations to the distribution of a sum of independent random variables. Annals of Probability, 11(4):1028--1036. tools:::Rd_expr_doi("10.1214/aop/1176993451")

Imhof, J. P. (1961). Computing the distribution of quadratic forms in normal variables. Biometrika, 48(3/4):419--426. tools:::Rd_expr_doi("10.2307/2332763")

Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6):110--114. tools:::Rd_expr_doi("10.2307/3002019")

Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3/4):350--362. tools:::Rd_expr_doi("10.2307/2332010")

Author(s)

Eduardo García-Portugués and Paula Navarro-Esteban.

  • Maintainer: Eduardo García-Portugués
  • License: GPL-3
  • Last published: 2024-05-24