CV function

Cross-validation bandwidths for spatial kernel density estimates

Cross-validation bandwidths for spatial kernel density estimates

Isotropic fixed or global (for adaptive) bandwidth selection for standalone 2D density/intensity based on either unbiased least squares cross-validation (LSCV) or likelihood (LIK) cross-validation.

LIK.density( pp, hlim = NULL, hseq = NULL, resolution = 64, edge = TRUE, auto.optim = TRUE, type = c("fixed", "adaptive"), seqres = 30, parallelise = NULL, zero.action = 0, verbose = TRUE, ... ) LSCV.density( pp, hlim = NULL, hseq = NULL, resolution = 64, edge = TRUE, auto.optim = TRUE, type = c("fixed", "adaptive"), seqres = 30, parallelise = NULL, zero.action = 0, verbose = TRUE, ... )

Arguments

  • pp: An object of class ppp giving the observed 2D data to be smoothed.
  • hlim: An optional vector of length 2 giving the limits of the optimisation routine with respect to the bandwidth. If unspecified, the function attempts to choose this automatically.
  • hseq: An optional increasing sequence of bandwidth values at which to manually evaluate the optimisation criterion. Used only in the case (!auto.optim && is.null(hlim)).
  • resolution: Spatial grid size; the optimisation will be based on a [resolution xx resolution] density estimate.
  • edge: Logical value indicating whether to edge-correct the density estimates used.
  • auto.optim: Logical value indicating whether to automate the numerical optimisation using optimise. If FALSE, the optimisation criterion is evaluated over hseq (if supplied), or over a seqence of values controlled by hlim and seqres.
  • type: A character string; "fixed" (default) performs classical leave-one-out cross-validation for the fixed-bandwidth estimator. Alternatively, "adaptive" utilises multiscale adaptive kernel estimation (Davies & Baddeley, 2018) to run the cross-validation in an effort to find a suitable global bandwidth for the adaptive estimator. Note that data points are not left out' of the pilot density estimate when using this option (this capability is currently in development). See also the entry for ...`.
  • seqres: Optional resolution of an increasing sequence of bandwidth values. Only used if (!auto.optim && is.null(hseq)).
  • parallelise: Numeric argument to invoke parallel processing, giving the number of CPU cores to use when !auto.optim and type = "fixed". Experimental. Test your system first using parallel::detectCores() to identify the number of cores available to you.
  • zero.action: A numeric integer, either -1, 0 (default), 1 or 2 controlling how the function should behave in response to numerical errors at very small bandwidths, when such a bandwidth results in one or more zero or negative density values during the leave-one-out computations. See `Details'.
  • verbose: Logical value indicating whether to provide function progress commentary.
  • ...: Additional arguments controlling pilot density estimation and multi-scale bandwidth-axis resolution when type = "adaptive". Relevant arguments are hp, pilot.density, gamma.scale, and trim from bivariate.density; and dimz from multiscale.density. If hp is missing and required, the function makes a (possibly recursive) call to LSCV.density to set this using fixed-bandwidth LSCV. The remaining defaults are pilot.density = pp, gamma.scale = "geometric", trim = 5, and dimz = resolution.

Returns

A single numeric value of the estimated bandwidth (if auto.optim = TRUE). Otherwise, a [[seqres xx 2]] matrix giving the bandwidth sequence and corresponding CV function value.

Details

This function implements the bivariate, edge-corrected versions of fixed-bandwidth least squares cross-validation and likelihood cross-validation as outlined in Sections 3.4.3 and 3.4.4 of Silverman (1986) in order to select an optimal fixed smoothing bandwidth. With type = "adaptive" it may also be used to select the global bandwidth for adaptive kernel density estimates, making use of multi-scale estimation (Davies and Baddeley, 2018) via multiscale.density. Note that for computational reasons, the leave-one-out procedure is not performed on the pilot density in the adaptive setting; it is only performed on the final stage estimate. Current development efforts include extending this functionality, see SLIK.adapt. See also `Warning' below.

Where LSCV.density is based on minimisation of an unbiased estimate of the mean integrated squared error (MISE) of the density, LIK.density is based on maximisation of the cross-validated leave-one-out average of the log-likelihood of the density estimate with respect to hh.

In both functions, the argument zero.action can be used to control the level of severity in response to small bandwidths that result (due to numerical error) in at least one density value being zero or less. When zero.action = -1, the function strictly forbids bandwidths that would result in one or more pixel values of a kernel estimate of the original data (i.e. anything over the whole region) being zero or less---this is the most restrictive truncation. With zero.action = 0 (default), the function automatically forbids bandwidths that yield erroneous values at the leave-one-out data point locations only. With zero.action = 1, the minimum machine value (see .Machine$double.xmin at the prompt) is used to replace these individual leave-one-out values. When zero.action = 2, the minimum value of the valid (greater than zero) leave-one-out values is used to replace any erroneous leave-one-out values.

Warning

Leave-one-out CV for bandwidth selection in kernel density estimation is notoriously unstable in practice and has a tendency to produce rather small bandwidths, particularly for spatial data. Satisfactory bandwidths are not guaranteed for every application; zero.action can curb adverse numeric effects for very small bandwidths during the optimisation procedures. This method can also be computationally expensive for large data sets and fine evaluation grid resolutions. The user may also need to experiment with adjusting hlim to find a suitable minimum.

Examples

data(pbc) pbccas <- split(pbc)$case LIK.density(pbccas) LSCV.density(pbccas) #* FIXED # custom limits LIK.density(pbccas,hlim=c(0.01,4)) LSCV.density(pbccas,hlim=c(0.01,4)) # disable edge correction LIK.density(pbccas,hlim=c(0.01,4),edge=FALSE) LSCV.density(pbccas,hlim=c(0.01,4),edge=FALSE) # obtain objective function hcv <- LIK.density(pbccas,hlim=c(0.01,4),auto.optim=FALSE) plot(hcv);abline(v=hcv[which.max(hcv[,2]),1],lty=2,col=2) #* ADAPTIVE LIK.density(pbccas,type="adaptive") LSCV.density(pbccas,type="adaptive") # change pilot bandwidth used LIK.density(pbccas,type="adaptive",hp=2) LSCV.density(pbccas,type="adaptive",hp=2)

References

Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28 (4), 937-956.

Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.

Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.

See Also

SLIK.adapt and functions for bandwidth selection in package spatstat: bw.diggle; bw.ppl; bw.scott; bw.frac.

Author(s)

T. M. Davies