Density, cumulative distribution function, quantile function and random number generation for the generalised Pareto distribution, either as a conditional on being above the threshold u or unconditional.
dgpd(x, u =0, sigmau =1, xi =0, phiu =1, log =FALSE)pgpd(q, u =0, sigmau =1, xi =0, phiu =1, lower.tail =TRUE)qgpd(p, u =0, sigmau =1, xi =0, phiu =1, lower.tail =TRUE)rgpd(n =1, u =0, sigmau =1, xi =0, phiu =1)
Arguments
x: quantiles
u: threshold
sigmau: scale parameter (positive)
xi: shape parameter
phiu: probability of being above threshold [0,1]
log: logical, if TRUE then log density
q: quantiles
lower.tail: logical, if FALSE then upper tail probabilities
p: cumulative probabilities
n: sample size (positive integer)
Returns
dgpd gives the density, pgpd gives the cumulative distribution function, qgpd gives the quantile function and rgpd gives a random sample.
Details
The GPD with parameters scale σu and shape ξ has conditional density of being above the threshold u given by
f(x∣X>u)=1/σu[1+ξ(x−u)/σu]−1/ξ−1
for non-zero ξ, x>u and σu>0. Further, [1+ξ(x−u)/σu]>0 which for ξ<0 implies u<x≤u−σu/ξ. In the special case of ξ=0
considered in the limit ξ→0, which is treated here as ∣ξ∣<1e−6, it reduces to the exponential:
f(x∣X>u)=1/σuexp(−(x−u)/σu).
The unconditional density is obtained by mutltiplying this by the survival probability (or tail fraction) ϕu=P(X>u)
giving f(x)=ϕuf(x∣X>u).
The syntax of these functions are similar to those of the evd package, so most code using these functions can be reused. The key difference is the introduction of phiu to permit output of unconditional quantities.
Note
All inputs are vectorised except log and lower.tail. The main inputs (x, p or q) and parameters must be either a scalar or a vector. If vectors are provided they must all be of the same length, and the function will be evaluated for each element of vector. In the case of rgpd any input vector must be of length n.
Default values are provided for all inputs, except for the fundamentals x, q and p. The default threshold u=0 and tail fraction phiu=1 which essentially assumes the user provide excesses above u by default, rather than exceedances. The default sample size for rgpd is 1.
Missing (NA) and Not-a-Number (NaN) values in x, p and q are passed through as is and infinite values are set to NA. None of these are not permitted for the parameters.
Some key differences arise for phiu=1 and phiu<1 (see examples below):
For phiu=1 the dgpd evaluates as zero for quantiles below the threshold u and pgpd
evaluates over [0,1].
For phiu=1 then pgpd evaluates as zero below the threshold u. For phiu\<1 it evaluates as 1−ϕu at the threshold and NA below the threshold.
For phiu=1 the quantiles from qgpd are above threshold and equal to threshold for phiu=0. For phiu\<1 then within upper tail, p \> 1 - phiu, it will give conditional quantiles above threshold, but when below the threshold, p \<= 1 - phiu, these are set to NA.
When simulating GPD variates using rgpd if phiu=1 then all values are above the threshold. For phiu\<1 then a standard uniform U is simulated and the variate will be classified as above the threshold if u\<ϕ, and below the threshold otherwise. This is equivalent to a binomial random variable for simulated number of exceedances. Those above the threshold are then simulated from the conditional GPD and those below the threshold and set to NA.
These conditions are intuitive and consistent with evd, which assumes missing data are below threshold.
Error checking of the inputs (e.g. invalid probabilities) is carried out and will either stop or give warning message as appropriate.
Acknowledgments
Based on the gpd functions in the evd package for which their author's contributions are gratefully acknowledged. They are designed to have similar syntax and functionality to simplify the transition for users of these packages.
Examples
set.seed(1)par(mfrow = c(2,2))x = rgpd(1000)# simulate sample from GPDxx = seq(-1,10,0.01)hist(x, breaks =100, freq =FALSE, xlim = c(-1,10))lines(xx, dgpd(xx))# three tail behavioursplot(xx, pgpd(xx), type ="l")lines(xx, pgpd(xx, xi =0.3), col ="red")lines(xx, pgpd(xx, xi =-0.3), col ="blue")legend("bottomright", paste("xi =",c(0,0.3,-0.3)), col=c("black","red","blue"), lty =1)# GPD when xi=0 is exponential, and demonstrating phiux = rexp(1000)hist(x, breaks =100, freq =FALSE, xlim = c(-1,10))lines(xx, dgpd(xx, u =0, sigmau =1, xi =0), lwd =2)lines(xx, dgpd(xx, u =0.5, phiu =1- pexp(0.5)), col ="red", lwd =2)lines(xx, dgpd(xx, u =1.5, phiu =1- pexp(1.5)), col ="blue", lwd =2)legend("topright", paste("u =",c(0,0.5,1.5)), col=c("black","red","blue"), lty =1, lwd =2)# Quantile function and phiup = pgpd(xx)plot(qgpd(p), p, type ="l")lines(xx, pgpd(xx, u =2), col ="red")lines(xx, pgpd(xx, u =5, phiu =0.2), col ="blue")legend("bottomright", c("u = 0 phiu = 1","u = 2 phiu = 1","u = 5 phiu = 0.2"), col=c("black","red","blue"), lty =1)
Hu Y. and Scarrott, C.J. (2018). evmix: An R Package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of Statistical Software 84(5), 1-27. doi: 10.18637/jss.v084.i05.
Coles, S.G. (2001). An Introduction to Statistical Modelling of Extreme Values. Springer Series in Statistics. Springer-Verlag: London.