besf function

Spatial regression with RE-ESF for very large samples

Spatial regression with RE-ESF for very large samples

Parallel and memory-free implementation of RE-ESF-based spatial regression for very large samples. This model estimates residual spatial dependence, constant coefficients, and non-spatially varying coefficients (NVC; coefficients varying with respect to explanatory variable value).

besf( y, x = NULL, nvc = FALSE, nvc_sel = TRUE, coords, s_id = NULL, covmodel="exp", enum = 200, method = "reml", penalty = "bic", nvc_num = 5, maxiter = 30, bsize = 4000, ncores = NULL )

Arguments

  • y: Vector of explained variables (N x 1)
  • x: Matrix of explanatory variables (N x K)
  • nvc: If TRUE, NVCs are assumed on x. Otherwise, constant coefficients are assumed. Default is FALSE
  • nvc_sel: If TRUE, type of coefficients (NVC or constant) is selected through a BIC (default) or AIC minimization. If FALSE, NVCs are assumed across x. Alternatively, nvc_sel can be given by column number(s) of x. For example, if nvc_sel = 2, the coefficient on the second explanatory variable in x is NVC and the other coefficients are constants. The Default is TRUE
  • coords: Matrix of spatial point coordinates (N x 2)
  • s_id: Optional. ID specifying groups modeling spatially dependent process (N x 1). If it is specified, group-level spatial process is estimated. It is useful. e.g., for multilevel modeling (s_id is given by the group ID) and panel data modeling (s_id is given by individual location id). Default is NULL
  • covmodel: Type of kernel to model spatial dependence. The currently available options are "exp" for the exponential kernel, "gau" for the Gaussian kernel, and "sph" for the spherical kernel
  • enum: Number of Moran eigenvectors to be used for spatial process modeling (scalar). Default is 200
  • method: Estimation method. Restricted maximum likelihood method ("reml") and maximum likelihood method ("ml") are available. Default is "reml"
  • penalty: Penalty to select type of coefficients (NVC or constant) to stablize the estimates. The current options are "bic" for the Baysian information criterion-type penalty (N x log(K)) and "aic" for the Akaike information criterion (2K) (see Muller et al., 2013). Default is "bic"
  • nvc_num: Number of basis functions used to model NVC. An intercept and nvc_num natural spline basis functions are used to model each NVC. Default is 5
  • maxiter: Maximum number of iterations. Default is 30
  • bsize: Block/badge size. bsize x bsize elements are iteratively processed during the parallelized computation. Default is 4000
  • ncores: Number of cores used for the parallel computation. If ncores = NULL, the number of available cores - 2 is detected and used. Default is NULL

Returns

  • b: Matrix with columns for the estimated coefficients on x, their standard errors, z-values, and p-values (K x 4). Effective if nvc =FALSE

  • c_vc: Matrix of estimated NVCs on x (N x K). Effective if nvc =TRUE

  • cse_vc: Matrix of standard errors for the NVCs on x (N x K). Effective if nvc =TRUE

  • ct_vc: Matrix of t-values for the NVCs on x (N x K). Effective if nvc =TRUE

  • cp_vc: Matrix of p-values for the NVCs on x (N x K). Effective if nvc =TRUE

  • s: Vector of estimated variance parameters (2 x 1). The first and the second elements denote the standard deviation and the Moran's I value of the estimated spatially dependent component, respectively. The Moran's I value is scaled to take a value between 0 (no spatial dependence) and 1 (the maximum possible spatial dependence). Based on Griffith (2003), the scaled Moran'I value is interpretable as follows: 0.25-0.50:weak; 0.50-0.70:moderate; 0.70-0.90:strong; 0.90-1.00:marked

  • e: Vector whose elements are residual standard error (resid_SE), adjusted conditional R2 (adjR2(cond)), restricted log-likelihood (rlogLik), Akaike information criterion (AIC), and Bayesian information criterion (BIC). When method = "ml", restricted log-likelihood (rlogLik) is replaced with log-likelihood (logLik)

  • vc: List indicating whether NVC are removed or not during the BIC/AIC minimization. 1 indicates not removed whreas 0 indicates removed

  • r: Vector of estimated random coefficients on Moran's eigenvectors (L x 1)

  • sf: Vector of estimated spatial dependent component (N x 1)

  • pred: Vector of predicted values (N x 1)

  • resid: Vector of residuals (N x 1)

  • other: List of other outputs, which are internally used

References

Griffith, D. A. (2003). Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization. Springer Science & Business Media.

Murakami, D. and Griffith, D.A. (2015) Random effects specifications in eigenvector spatial filtering: a simulation study. Journal of Geographical Systems, 17 (4), 311-331.

Murakami, D. and Griffith, D.A. (2019) A memory-free spatial additive mixed modeling for big spatial data. Japan Journal of Statistics and Data Science. DOI:10.1007/s42081-019-00063-x.

Author(s)

Daisuke Murakami

See Also

resf

Examples

require(spdep) data(boston) y <- boston.c[, "CMEDV" ] x <- boston.c[,c("CRIM","ZN","INDUS", "CHAS", "NOX","RM", "AGE", "DIS" ,"RAD", "TAX", "PTRATIO", "B", "LSTAT")] xgroup <- boston.c[,"TOWN"] coords <- boston.c[,c("LON", "LAT")] ######## Regression considering spatially dependent residuals #res <- besf(y = y, x = x, coords=coords) #res ######## Regression considering spatially dependent residuals and NVC ######## (coefficients or NVC is selected) #res2 <- besf(y = y, x = x, coords=coords, nvc = TRUE) ######## Regression considering spatially dependent residuals and NVC ######## (all the coefficients are NVCs) #res3 <- besf(y = y, x = x, coords=coords, nvc = TRUE, nvc_sel=FALSE)