game function

Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation

Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation

gamGPDfit() fits the parameters of a generalized Pareto distribution (GPD) depending on covariates in a non- or semiparametric way.

gamGPDboot() fits and bootstraps the parameters of a GPD distribution depending on covariates in a non- or semiparametric way. Applies the post-blackend bootstrap of Chavez-Demoulin and Davison (2005).

gamGPDfit(x, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-05, epsnu=1e-05, progress=TRUE, verbose=FALSE, ...) gamGPDboot(x, B, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-5, epsnu=1e-5, boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)

Arguments

  • x: data.frame containing the losses (in some component; can be specified with the argument datvar; the other components contain the covariates).
  • B: number of bootstrap replications.
  • threshold: threshold of the peaks-over-threshold (POT) method.
  • nexc: number of excesses. This can be used to determine
  • datvar: name of the data column in x which contains the the data to be modeled.
  • xiFrhs: right-hand side of the formula for xixi in the gam() call for fitting xixi.
  • nuFrhs: right-hand side of the formula for nunu in the gam() call for fitting nunu.
  • init: bivariate vector containing initial values for (xi,beta)(xi, beta).
  • niter: maximal number of iterations in the backfitting algorithm.
  • include.updates: logical indicating whether updates for xi and nu are returned as well (note: this might lead to objects of large size).
  • epsxi: epsilon for stop criterion for xixi.
  • epsnu: epsilon for stop criterion for nunu.
  • boot.progress: logical indicating whether progress information about gamGPDboot() is displayed.
  • progress: logical indicating whether progress information about gamGPDfit() is displayed. For gamGPDboot(), progress is only passed to gamGPDfit() in the case that boot.progress==TRUE.
  • verbose: logical indicating whether additional information (in case of undesired behavior) is printed. For gamGPDboot(), progress is only passed to gamGPDfit() if boot.progress==TRUE.
  • debug: logical indicating whether initial fit (before the bootstrap is initiated) is saved.
  • ...: additional arguments passed to gam() (which is called internally; see the source code of gamGPDfitUp()).

Details

gamGPDfit() fits the parameters xixi and betabeta of the generalized Pareto distribution GPD(xi,beta)GPD(xi,beta) depending on covariates in a non- or semiparametric way. The distribution function is given by

Gξ,β(x)=1(1+ξx/β)1/ξ,\quadx0,G[xi,beta](x)=1(1+xix/beta)(1/xi),x>=0, G_{\xi,\beta}(x)=1-(1+\xi x/\beta)^{-1/\xi},\quadx\ge0,G[xi,beta](x)=1-(1+xi x/beta)^(-1/xi), x>=0,

for xi>0xi>0 (which is what we assume) and beta>0beta>0. Note that β\beta is also denoted by σ\sigma in this package. Estimation of xixi

and betabeta by gamGPDfit() is done via penalized maximum likelihood estimation, where the estimators are computed with a backfitting algorithm. In order to guarantee convergence of this algorithm, a reparameterization of betabeta in terms of the parameter nunu is done via

β=exp(ν)/(1+ξ).beta=exp(nu)/(1+xi). \beta=\exp(\nu)/(1+\xi).beta=exp(nu)/(1+xi).

The parameters xixi and nunu (and thus betabeta) are allowed to depend on covariates (including time) in a non- or semiparametric way, for example:

ξ=ξ(x,t)=xαξ+hξ(t),xi=xi(x,t)=xTalpha[xi]+h[xi](t), \xi=\xi(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\xi}+h_{\xi}(t),xi=xi(x,t)=x^Talpha[xi]+h[xi](t), ν=ν(x,t)=xαν+hν(t),nu=nu(x,t)=xTalpha[nu]+h[nu](t), \nu=\nu(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\nu}+h_{\nu}(t),nu=nu(x,t)=x^Talpha[nu]+h[nu](t),

where xx denotes the vector of covariates, alpha[xi]alpha[xi], alpha[nu]alpha[nu]

are parameter vectors and h[xi]h[xi], h[nu]h[nu] are regression splines. For more details, see the references and the source code.

gamGPDboot() first fits the GPD parameters via gamGPDfit(). It then conducts the post-blackend bootstrap of Chavez-Demoulin and Davison (2005). To this end, it computes the residuals, resamples them (B times), reconstructs the corresponding excesses, and refits the GPD parameters via gamGPDfit() again.

Returns

gamGPDfit() returns a list with the components

  • xi:: estimated parameters xixi;

  • beta:: estimated parameters betabeta;

  • nu:: estimated parameters nunu;

  • se.xi:: standard error for xixi ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to xixi) multiplied by -1;

  • se.nu:: standard error for nunu ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to nunu) multiplied by -1;

  • xi.covar:: (unique) covariates for xixi;

  • nu.covar:: (unique) covariates for nunu;

  • covar:: available covariate combinations used for fitting betabeta(xixi, nunu);

  • y:: vector of excesses (exceedances minus threshold);

  • res:: residuals;

  • MRD:: mean relative distances between for all iterations, calculated between old parameters c("(xi,\n(xi,\n", "\tnu)\tnu)") (from the last iteration) and new parameters (currently estimated ones);

  • logL:: log-likelihood at the estimated parameters;

  • xiObj:: object of type gamObject for estimated xixi (returned by mgcv::gam());

  • nuObj:: object of type gamObject for estimated nunu (returned by mgcv::gam());

  • xiUpdates:: if include.updates is TRUE, updates for xixi for each iteration. This is a list of objects of type gamObject

     which contains `xiObj` as last element;
    
  • nuUpdates:: if include.updates is TRUE, updates for nunu for each iteration. This is a list of objects of type gamObject

     which contains `nuObj` as last element;
    

gamGPDboot() returns a list of length B+1 where the first component contains the results of the initial fit via gamGPDfit() and the other B

components contain the results for each replication of the post-blackend bootstrap.

Author(s)

Marius Hofert, Valerie Chavez-Demoulin.

References

Chavez-Demoulin, V., and Davison, A. C. (2005), Generalized additive models for sample extremes, Applied Statistics 54 (1), 207--222.

Chavez-Demoulin, V., and Hofert, M. (to be submitted), Smooth extremal models fitted by penalized maximum likelihood estimation.

Examples

### Example 1: fitting capability ############################################## ## generate an example data set years <- 2003:2012 # years nyears <- length(years) n <- 250 # sample size for each (different) xi u <- 200 # threshold rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD set.seed(17) # setting seed xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A" ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) xi.true.B <- xi.true.A^2 # true xi for group "B" ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame time <- rep(rep(years, each=n), 2) # "2" stands for the two groups covar <- rep(c("A","B"), each=n*nyears) value <- c(lossA, lossB) x <- data.frame(covar=covar, time=time, value=value) ## fit eps <- 1e-3 # to decrease the run time for this example fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1, nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps) ## note: choosing s(..., bs="cr") will fit cubic splines ## grab the fitted values per group and year xi.fit <- fitted(fit$xiObj) xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year ## plot the fitted values of xi and the true ones we simulated from par(mfrow=c(1,2)) plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A), main="Group A", xlab="Year", ylab=expression(xi)) points(years, xi.fit.A, type="l", col="red") legend("topleft", inset=0.04, lty=1, col=c("black", "red"), legend=c("true", "fitted"), bty="n") plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B), main="Group B", xlab="Year", ylab=expression(xi)) points(years, xi.fit.B, type="l", col="blue") legend("topleft", inset=0.04, lty=1, col=c("black", "blue"), legend=c("true", "fitted"), bty="n") ## Not run: ### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ######## set.seed(17) # setting seed xi.true.A <- rep(0.4, length=nyears) xi.true.B <- rep(0.8, length=nyears) ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame x <- data.frame(covar=covar, time=time, value=c(lossA, lossB)) ## fit with gpd.fit fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x) xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4] xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4] ## fit with gamGPDfit() fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar, epsxi=eps, epsnu=eps) xi.fit <- fitted(fit$xiObj) xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A" xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B" ## comparison xi.fit.A-xi.fit.coles.A xi.fit.B-xi.fit.coles.B ## End(Not run) # dontrun