pit_histogram_sample function

Probability integral transformation for counts

Probability integral transformation for counts

Uses a Probability integral transformation (PIT) (or a randomised PIT for integer forecasts) to assess the calibration of predictive Monte Carlo samples.

pit_histogram_sample( observed, predicted, quantiles, integers = c("nonrandom", "random", "ignore"), n_replicates = NULL )

Arguments

  • observed: A vector with observed values of size n
  • predicted: nxN matrix of predictive samples, n (number of rows) being the number of data points and N (number of columns) the number of Monte Carlo samples. Alternatively, predicted can just be a vector of size n.
  • quantiles: A vector of quantiles between which to calculate the PIT.
  • integers: How to handle integer forecasts (count data). This is based on methods described Czado et al. (2007). If "nonrandom" (default) the function will use the non-randomised PIT method. If "random", will use the randomised PIT method. If "ignore", will treat integer forecasts as if they were continuous.
  • n_replicates: The number of draws for the randomised PIT for discrete predictions. Will be ignored if forecasts are continuous or integers is not set to random.

Returns

A vector with PIT histogram densities for the bins corresponding to the given quantiles.

Details

Calibration or reliability of forecasts is the ability of a model to correctly identify its own uncertainty in making predictions. In a model with perfect calibration, the observed data at each time point look as if they came from the predictive probability distribution at that time.

Equivalently, one can inspect the probability integral transform of the predictive distribution at time t,

ut=Ft(xt) u_t = F_t (x_t)

where xtx_t is the observed data point at time tint1,,tnt in t_1, …, t_n, n being the number of forecasts, and FtF_t is the (continuous) predictive cumulative probability distribution at time t. If the true probability distribution of outcomes at time t is GtG_t then the forecasts FtF_t are said to be ideal if Ft=GtF_t = G_t at all times t. In that case, the probabilities utu_t are distributed uniformly.

In the case of discrete nonnegative outcomes such as incidence counts, the PIT is no longer uniform even when forecasts are ideal. In that case two methods are available ase described by Czado et al. (2007).

By default, a nonrandomised PIT is calculated using the conditional cumulative distribution function

F(u)={0if v<Pt(kt1)(vPt(kt1))/(Pt(kt)Pt(kt1))if Pt(kt1)v<Pt(kt)1if vPt(kt) F(u) =\begin{cases}0 & \text{if } v < P_t(k_t - 1) \\(v - P_t(k_t - 1)) / (P_t(k_t) - P_t(k_t - 1)) & \text{if } P_t(k_t - 1) \leq v < P_t(k_t) \\1 & \text{if } v \geq P_t(k_t)\end{cases}

where ktk_t is the observed count, Pt(x)P_t(x) is the predictive cumulative probability of observing incidence kk at time tt and Pt(1)=0P_t (-1) = 0 by definition. Values of the PIT histogram are then created by averaging over the nn

predictions,

Fˉ(u)=i=1ni=1nF(i)(u) \bar{F}(u) = \frac{i = 1}{n} \sum_{i=1}^{n} F^{(i)}(u)

And calculating the value at each bin between quantile qiq_i and quantile qi+1q_{i + 1} as

Fˉ(qi)Fˉ(qi+1) \bar{F}(q_i) - \bar{F}(q_{i + 1})

Alternatively, a randomised PIT can be used instead. In this case, the PIT is

ut=Pt(kt)+v(Pt(kt)Pt(kt1)) u_t = P_t(k_t) + v * (P_t(k_t) - P_t(k_t - 1))

where vv is standard uniform and independent of kk. The values of the PIT histogram are then calculated by binning the utu_t values as above.

Examples

## continuous predictions observed <- rnorm(20, mean = 1:20) predicted <- replicate(100, rnorm(n = 20, mean = 1:20)) pit <- pit_histogram_sample(observed, predicted, quantiles = seq(0, 1, 0.1)) ## integer predictions observed <- rpois(20, lambda = 1:20) predicted <- replicate(100, rpois(n = 20, lambda = 1:20)) pit <- pit_histogram_sample(observed, predicted, quantiles = seq(0, 1, 0.1)) ## integer predictions, randomised PIT observed <- rpois(20, lambda = 1:20) predicted <- replicate(100, rpois(n = 20, lambda = 1:20)) pit <- pit_histogram_sample( observed, predicted, quantiles = seq(0, 1, 0.1), integers = "random", n_replicates = 30 )

References

Claudia Czado, Tilmann Gneiting Leonhard Held (2009) Predictive model assessment for count data. Biometrika, 96(4), 633-648. Sebastian Funk, Anton Camacho, Adam J. Kucharski, Rachel Lowe, Rosalind M. Eggo, W. John Edmunds (2019) Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 2014-15, tools:::Rd_expr_doi("10.1371/journal.pcbi.1006785")

See Also

get_pit_histogram()