weighted_quantile function

Weighted sample quantiles

Weighted sample quantiles

A variation of quantile() that can be applied to weighted samples.

weighted_quantile( x, probs = seq(0, 1, 0.25), weights = NULL, n = NULL, na.rm = FALSE, names = TRUE, type = 7, digits = 7 ) weighted_quantile_fun(x, weights = NULL, n = NULL, na.rm = FALSE, type = 7)

Arguments

  • x: numeric vector: sample values

  • probs: numeric vector: probabilities in [0,1][0, 1]

  • weights: Weights for the sample. One of:

    • numeric vector of same length as x: weights for corresponding values in x, which will be normalized to sum to 1.
    • NULL: indicates no weights are provided, so unweighted sample quantiles (equivalent to quantile()) are returned.
  • n: Presumed effective sample size. If this is greater than 1 and continuous quantiles (type >= 4) are requested, flat regions may be added to the approximation to the inverse CDF in areas where the normalized weight exceeds 1/n (i.e., regions of high density). This can be used to ensure that if a sample of size n with duplicate x values is summarized into a weighted sample without duplicates, the result of weighted_quantile(..., n = n)

    on the weighted sample is equal to the result of quantile() on the original sample. One of:

    • NULL: do not make a sample size adjustment.

    • numeric: presumed effective sample size.

    • function or name of function (as a string): A function applied to weights (prior to normalization) to determine the sample size. Some useful values may be:

      • "length": i.e. use the number of elements in weights (equivalently in x) as the effective sample size.
      • "sum": i.e. use the sum of the unnormalized weights as the sample size. Useful if the provided weights is unnormalized so that its sum represents the true sample size.
  • na.rm: logical: if TRUE, corresponding entries in x and weights

    are removed if either is NA.

  • names: logical: If TRUE, add names to the output giving the input probs formatted as a percentage.

  • type: integer between 1 and 9: determines the type of quantile estimator to be used. Types 1 to 3 are for discontinuous quantiles, types 4 to 9 are for continuous quantiles. See Details .

  • digits: numeric: the number of digits to use to format percentages when names is TRUE.

Returns

weighted_quantile() returns a numeric vector of length(probs) with the estimate of the corresponding quantile from probs.

weighted_quantile_fun() returns a function that takes a single argument, a vector of probabilities, which itself returns the corresponding quantile estimates. It may be useful when weighted_quantile() needs to be called repeatedly for the same sample, re-using some pre-computation.

Details

Calculates weighted quantiles using a variation of the quantile types based on a generalization of quantile().

Type 1--3 (discontinuous) quantiles are directly a function of the inverse CDF as a step function, and so can be directly translated to the weighted case using the natural definition of the weighted ECDF as the cumulative sum of the normalized weights.

Type 4--9 (continuous) quantiles require some translation from the definitions in quantile(). quantile() defines continuous estimators in terms of xkx_k, which is the kkth order statistic, and pkp_k, which is a function of kk

and nn (the sample size). In the weighted case, we instead take xkx_k as the kkth smallest value of xx in the weighted sample (not necessarily an order statistic, because of the weights). Then we can re-write the formulas for pkp_k in terms of F(xk)F(x_k) (the empirical CDF at xkx_k, i.e. the cumulative sum of normalized weights) and f(xk)f(x_k) (the normalized weight at xkx_k), by using the fact that, in the unweighted case, k=F(xk)nk = F(x_k) \cdot n and 1/n=f(xk)1/n = f(x_k):

  • Type 4: pk=kn=F(xk)p_k = \frac{k}{n} = F(x_k)
  • Type 5: pk=k0.5n=F(xk)f(xk)2p_k = \frac{k - 0.5}{n} = F(x_k) - \frac{f(x_k)}{2}
  • Type 6: pk=kn+1=F(xk)1+f(xk)p_k = \frac{k}{n + 1} = \frac{F(x_k)}{1 + f(x_k)}
  • Type 7: pk=k1n1=F(xk)f(xk)1f(xk)p_k = \frac{k - 1}{n - 1} = \frac{F(x_k) - f(x_k)}{1 - f(x_k)}
  • Type 8: pk=k1/3n+1/3=F(xk)f(xk)/31+f(xk)/3p_k = \frac{k - 1/3}{n + 1/3} = \frac{F(x_k) - f(x_k)/3}{1 + f(x_k)/3}
  • Type 9: pk=k3/8n+1/4=F(xk)f(xk)3/81+f(xk)/4p_k = \frac{k - 3/8}{n + 1/4} = \frac{F(x_k) - f(x_k) \cdot 3/8}{1 + f(x_k)/4}

Then the quantile function (inverse CDF) is the piece-wise linear function defined by the points (pk,xk)(p_k, x_k).

See Also

weighted_ecdf()