jSDM_binomial_probit function

Binomial probit regression

Binomial probit regression

The jSDM_binomial_probit function performs a Binomial probit regression in a Bayesian framework. The function calls a Gibbs sampler written in 'C++' code which uses conjugate priors to estimate the conditional posterior distribution of model's parameters.

jSDM_binomial_probit( burnin = 5000, mcmc = 10000, thin = 10, presence_data, site_formula, trait_data = NULL, trait_formula = NULL, site_data, n_latent = 0, site_effect = "none", lambda_start = 0, W_start = 0, beta_start = 0, alpha_start = 0, gamma_start = 0, V_alpha = 1, mu_beta = 0, V_beta = 10, mu_lambda = 0, V_lambda = 10, mu_gamma = 0, V_gamma = 10, shape_Valpha = 0.5, rate_Valpha = 5e-04, seed = 1234, verbose = 1 )

Arguments

  • burnin: The number of burnin iterations for the sampler.

  • mcmc: The number of Gibbs iterations for the sampler. Total number of Gibbs iterations is equal to burnin+mcmc. burnin+mcmc must be divisible by 10 and superior or equal to 100 so that the progress bar can be displayed.

  • thin: The thinning interval used in the simulation. The number of mcmc iterations must be divisible by this value.

  • presence_data: A matrix nsitexnspeciesn_site x n_species indicating the presence by a 1 (or the absence by a 0) of each species on each site.

  • site_formula: A one-sided formula of the form '~x1+...+xp' specifying the explanatory variables for the suitability process of the model, used to form the design matrix XX of size nsitexnpn_site x np.

  • trait_data: A data frame containing the species traits which can be included as part of the model. Default to NULL to fit a model without species traits.

  • trait_formula: A one-sided formula of the form '~ t1 + ... + tk + x1:t1 + ... + xp:tk' specifying the interactions between the environmental variables and the species traits to be considered in the model, used to form the trait design matrix TrTr of size nspeciesxntn_species x nt and to set to 00 the γ\gamma parameters corresponding to interactions not taken into account according to the formula. Default to NULL to fit a model with all possible interactions between species traits found in trait_data and environmental variables defined by site_formula.

  • site_data: A data frame containing the model's explanatory variables by site.

  • n_latent: An integer which specifies the number of latent variables to generate. Defaults to 00.

  • site_effect: A string indicating whether row effects are included as fixed effects ("fixed"), as random effects ("random"), or not included ("none") in the model. If fixed effects, then for parameter identifiability the first row effect is set to zero, which analogous to acting as a reference level when dummy variables are used. If random effects, they are drawn from a normal distribution with mean zero and unknown variance, analogous to a random intercept in mixed models. Defaults to "none".

  • lambda_start: Starting values for λ\lambda parameters corresponding to the latent variables for each species must be either a scalar or a nlatentxnspeciesn_latent x n_species upper triangular matrix with strictly positive values on the diagonal, ignored if n_latent=0. If lambda_start takes a scalar value, then that value will serve for all of the λ\lambda parameters except those concerned by the constraints explained above.

  • W_start: Starting values for latent variables must be either a scalar or a nsitexnlatentn_site x n_latent matrix, ignored if n_latent=0. If W_start takes a scalar value, then that value will serve for all of the WilW_il with i=1,...,nsitei=1,...,n_site and l=1,...,nlatentl=1,...,n_latent.

  • beta_start: Starting values for β\beta parameters of the suitability process for each species must be either a scalar or a npxnspeciesnp x n_species matrix. If beta_start takes a scalar value, then that value will serve for all of the β\beta parameters.

  • alpha_start: Starting values for random site effect parameters must be either a scalar or a nsiten_site-length vector, ignored if site_effect="none". If alpha_start takes a scalar value, then that value will serve for all of the α\alpha parameters.

  • gamma_start: Starting values for γ\gamma parameters that represent the influence of species-specific traits on species' responses β\beta, gamma_start must be either a scalar, a vector of length ntnt, npnp or nt.npnt.np or a ntxnpnt x np matrix. If gamma_start takes a scalar value, then that value will serve for all of the γ\gamma parameters. If gamma_start is a vector of length ntnt or nt.npnt.np the resulting ntxnpnt x np matrix is filled by column with specified values, if a npnp-length vector is given, the matrix is filled by row.

  • V_alpha: Starting value for variance of random site effect if site_effect="random" or constant variance of the Gaussian prior distribution for the fixed site effect if site_effect="fixed". Must be a strictly positive scalar, ignored if site_effect="none".

  • mu_beta: Means of the Normal priors for the β\beta parameters of the suitability process. mu_beta must be either a scalar or a npnp-length vector. If mu_beta takes a scalar value, then that value will serve as the prior mean for all of the β\beta parameters. The default value is set to 0 for an uninformative prior, ignored if trait_data is specified.

  • V_beta: Variances of the Normal priors for the β\beta parameters of the suitability process. V_beta must be either a scalar or a npxnpnp x np symmetric positive semi-definite square matrix. If V_beta takes a scalar value, then that value will serve as the prior variance for all of the β\beta parameters, so the variance covariance matrix used in this case is diagonal with the specified value on the diagonal. The default variance is large and set to 10 for an uninformative flat prior.

  • mu_lambda: Means of the Normal priors for the λ\lambda parameters corresponding to the latent variables. mu_lambda must be either a scalar or a nlatentn_latent-length vector. If mu_lambda takes a scalar value, then that value will serve as the prior mean for all of the λ\lambda parameters. The default value is set to 0 for an uninformative prior.

  • V_lambda: Variances of the Normal priors for the λ\lambda parameters corresponding to the latent variables. V_lambda must be either a scalar or a nlatentxnlatentn_latent x n_latent symmetric positive semi-definite square matrix. If V_lambda takes a scalar value, then that value will serve as the prior variance for all of λ\lambda parameters, so the variance covariance matrix used in this case is diagonal with the specified value on the diagonal. The default variance is large and set to 10 for an uninformative flat prior.

  • mu_gamma: Means of the Normal priors for the γ\gamma parameters. mu_gamma must be either a scalar, a vector of length ntnt, npnp or nt.npnt.np or a ntxnpnt x np matrix. If mu_gamma takes a scalar value, then that value will serve as the prior mean for all of the γ\gamma parameters. If mu_gamma is a vector of length ntnt or nt.npnt.np the resulting ntxnpnt x np matrix is filled by column with specified values, if a npnp-length vector is given, the matrix is filled by row. The default value is set to 0 for an uninformative prior, ignored if trait_data=NULL.

  • V_gamma: Variances of the Normal priors for the γ\gamma parameters. V_gamma must be either a scalar, a vector of length ntnt, npnp or nt.npnt.np or a ntxnpnt x np positive matrix. If V_gamma takes a scalar value, then that value will serve as the prior variance for all of the γ\gamma parameters. If V_gamma is a vector of length ntnt or nt.npnt.np the resulting ntxnpnt x np matrix is filled by column with specified values, if a npnp-length vector is given, the matrix is filled by row. The default variance is large and set to 10 for an uninformative flat prior, ignored if trait_data=NULL.

  • shape_Valpha: Shape parameter of the Inverse-Gamma prior for the random site effect variance V_alpha, ignored if site_effect="none" or site_effect="fixed". Must be a strictly positive scalar. Default to 0.5 for weak informative prior.

  • rate_Valpha: Rate parameter of the Inverse-Gamma prior for the random site effect variance V_alpha, ignored if site_effect="none" or site_effect="fixed"

    Must be a strictly positive scalar. Default to 0.0005 for weak informative prior.

  • seed: The seed for the random number generator. Default to 1234.

  • verbose: A switch (0,1) which determines whether or not the progress of the sampler is printed to the screen. Default is 1: a progress bar is printed, indicating the step (in %) reached by the Gibbs sampler.

Returns

An object of class "jSDM" acting like a list including : - mcmc.alpha: An mcmc object that contains the posterior samples for site effects α\alpha, not returned if site_effect="none".

  • mcmc.V_alpha: An mcmc object that contains the posterior samples for variance of random site effect, not returned if site_effect="none" or site_effect="fixed".

  • mcmc.latent: A list by latent variable of mcmc objects that contains the posterior samples for latent variables WlW_l with l=1,...,nlatentl=1,...,n_latent, not returned if n_latent=0.

  • mcmc.sp: A list by species of mcmc objects that contains the posterior samples for species effects βj\beta_j and λj\lambda_j if n_latent>0 with j=1,...,nspeciesj=1,...,n_species.

  • mcmc.gamma: A list by covariates of mcmc objects that contains the posterior samples for γp\gamma_p parameters with p=1,...,npp=1,...,np if trait_data is specified.

  • mcmc.Deviance: The posterior sample of the deviance (DD) is also provided, with DD defined as : D=2log(ijP(yijβj,λj,αi,Wi))D=-2log(\prod_ij P(y_ij|\beta_j,\lambda_j, \alpha_i, W_i)).

  • Z_latent: Predictive posterior mean of the latent variable Z.

  • probit_theta_latent: Predictive posterior mean of the probability to each species to be present on each site, transformed by probit link function.

  • theta_latent: Predictive posterior mean of the probability to each species to be present on each site.

  • model_spec: Various attributes of the model fitted, including the response and model matrix used, distributional assumptions as link function, family and number of latent variables, hyperparameters used in the Bayesian estimation and mcmc, burnin and thin.

The mcmc. objects can be summarized by functions provided by the coda package.

Details

We model an ecological process where the presence or absence of species jj on site ii is explained by habitat suitability.

Ecological process:

yijBernoulli(θij)yij Bernoulli(θij), y_{ij} \sim \mathcal{B}ernoulli(\theta_{ij})y_ij ~ Bernoulli(\theta_ij),

where

if n_latent=0 and site_effect="none"probit (θij)=Xiβj(\theta_ij) = X_i \beta_j
if n_latent>0 and site_effect="none"probit (θij)=Xiβj+Wiλj(\theta_ij) = X_i \beta_j + W_i \lambda_j
if n_latent=0 and site_effect="random"probit (θij)=Xiβj+αi(\theta_ij) = X_i \beta_j + \alpha_i and αi N(0,Vα)\alpha_i ~ N(0,V_\alpha)
if n_latent>0 and site_effect="fixed"probit (θij)=Xiβj+Wiλj+αi(\theta_ij) = X_i \beta_j + W_i \lambda_j + \alpha_i
if n_latent=0 and site_effect="fixed"probit (θij)=Xiβj+αi(\theta_ij) = X_i \beta_j + \alpha_i
if n_latent>0 and site_effect="random"probit (θij)=Xiβj+Wiλj+αi(\theta_ij) = X_i \beta_j + W_i \lambda_j + \alpha_i and αi N(0,Vα)\alpha_i ~ N(0,V_\alpha)

In the absence of data on species traits (trait_data=NULL), the effect of species jj: βj\beta_j; follows the same a priori Gaussian distribution such that βj Nnp(μβ,Vβ)\beta_j ~ N_np(\mu_\beta,V_\beta), for each species.

If species traits data are provided, the effect of species jj: βj\beta_j; follows an a priori Gaussian distribution such that βj Nnp(μ\betaj,Vβ)\beta_j ~ N_np(\mu_\betaj,V_\beta), where muβ.jp=Σ(k=1,...,nt)tjk.γkpmu_\beta.jp=\Sigma_(k=1,...,nt) t_jk.\gamma_kp, takes different values for each species.

We assume that γ.kp N(μγ.kp,Vγ.kp)\gamma.kp ~ N(\mu_\gamma.kp,V_\gamma.kp) as prior distribution.

We define the matrix γ=(γkp)fork=1,...,ntandp=1,...,np\gamma=(\gamma_kp) for k=1,...,nt and p=1,...,np such as :

x_0x_1...x_p...x_np
__________________________________________________
t_0 |gamma_(0,0)γ(0,1)\gamma_(0,1)...γ(0,p)\gamma_(0,p)...γ(0,np)\gamma_(0,np){ effect of
|interceptenvironmental
|variables
t_1 |γ(1,0)\gamma_(1,0)γ(1,1)\gamma_(1,1)...γ(1,p)\gamma_(1,p)...γ(1,np)\gamma_(1,np)
... |..................
t_k |γ(k,0)\gamma_(k,0)γ(k,1)\gamma_(k,1)...γ(k,p)\gamma_(k,p)...γ(k,np)\gamma_(k,np)
... |..................
t_nt |γ(nt,0)\gamma_(nt,0)γ(nt,1)\gamma_(nt,1)...γ(nt,p)\gamma_(nt,p)...γ(nt,np)\gamma_(nt,np)
average
trait effectinteractiontraitsenvironment

Examples

#====================================== # jSDM_binomial_probit() # Example with simulated data #==================================== #================= #== Load libraries library(jSDM) #================== #' #== Data simulation #= Number of sites nsite <- 150 #= Set seed for repeatability seed <- 1234 set.seed(seed) #= Number of species nsp<- 20 #= Number of latent variables n_latent <- 2 #= Ecological process (suitability) x1 <- rnorm(nsite,0,1) x2 <- rnorm(nsite,0,1) X <- cbind(rep(1,nsite),x1,x2) np <- ncol(X) #= Latent variables W W <- matrix(rnorm(nsite*n_latent,0,1), nsite, n_latent) #= Fixed species effect beta beta.target <- t(matrix(runif(nsp*np,-2,2), byrow=TRUE, nrow=nsp)) #= Factor loading lambda lambda.target <- matrix(0, n_latent, nsp) mat <- t(matrix(runif(nsp*n_latent, -2, 2), byrow=TRUE, nrow=nsp)) lambda.target[upper.tri(mat, diag=TRUE)] <- mat[upper.tri(mat, diag=TRUE)] diag(lambda.target) <- runif(n_latent, 0, 2) #= Variance of random site effect V_alpha.target <- 0.5 #= Random site effect alpha alpha.target <- rnorm(nsite,0 , sqrt(V_alpha.target)) # Simulation of response data with probit link probit_theta <- X%*%beta.target + W%*%lambda.target + alpha.target theta <- pnorm(probit_theta) e <- matrix(rnorm(nsp*nsite,0,1),nsite,nsp) # Latent variable Z Z_true <- probit_theta + e # Presence-absence matrix Y Y <- matrix (NA, nsite,nsp) for (i in 1:nsite){ for (j in 1:nsp){ if ( Z_true[i,j] > 0) {Y[i,j] <- 1} else {Y[i,j] <- 0} } } #================================== #== Site-occupancy model # Increase number of iterations (burnin and mcmc) to get convergence mod<-jSDM_binomial_probit(# Iteration burnin=200, mcmc=200, thin=1, # Response variable presence_data=Y, # Explanatory variables site_formula=~x1+x2, site_data = X, n_latent=2, site_effect="random", # Starting values alpha_start=0, beta_start=0, lambda_start=0, W_start=0, V_alpha=1, # Priors shape_Valpha=0.5, rate_Valpha=0.0005, mu_beta=0, V_beta=1, mu_lambda=0, V_lambda=1, seed=1234, verbose=1) # =================================================== # Result analysis # =================================================== #========== #== Outputs oldpar <- par(no.readonly = TRUE) #= Parameter estimates ## beta_j mean_beta <- matrix(0,nsp,ncol(X)) pdf(file=file.path(tempdir(), "Posteriors_beta_jSDM_probit.pdf")) par(mfrow=c(ncol(X),2)) for (j in 1:nsp) { mean_beta[j,] <- apply(mod$mcmc.sp[[j]] [,1:ncol(X)], 2, mean) for (p in 1:ncol(X)){ coda::traceplot(mod$mcmc.sp[[j]][,p]) coda::densplot(mod$mcmc.sp[[j]][,p], main = paste(colnames(mod$mcmc.sp[[j]])[p],", species : ",j)) abline(v=beta.target[p,j],col='red') } } dev.off() ## lambda_j mean_lambda <- matrix(0,nsp,n_latent) pdf(file=file.path(tempdir(), "Posteriors_lambda_jSDM_probit.pdf")) par(mfrow=c(n_latent*2,2)) for (j in 1:nsp) { mean_lambda[j,] <- apply(mod$mcmc.sp[[j]] [,(ncol(X)+1):(ncol(X)+n_latent)], 2, mean) for (l in 1:n_latent) { coda::traceplot(mod$mcmc.sp[[j]][,ncol(X)+l]) coda::densplot(mod$mcmc.sp[[j]][,ncol(X)+l], main=paste(colnames(mod$mcmc.sp[[j]]) [ncol(X)+l],", species : ",j)) abline(v=lambda.target[l,j],col='red') } } dev.off() # Species effects beta and factor loadings lambda par(mfrow=c(1,2)) plot(t(beta.target), mean_beta, main="species effect beta", xlab ="obs", ylab ="fitted") abline(a=0,b=1,col='red') plot(t(lambda.target), mean_lambda, main="factor loadings lambda", xlab ="obs", ylab ="fitted") abline(a=0,b=1,col='red') ## W latent variables par(mfrow=c(1,2)) for (l in 1:n_latent) { plot(W[,l], summary(mod$mcmc.latent[[paste0("lv_",l)]])[[1]][,"Mean"], main = paste0("Latent variable W_", l), xlab ="obs", ylab ="fitted") abline(a=0,b=1,col='red') } ## alpha par(mfrow=c(1,3)) plot(alpha.target, summary(mod$mcmc.alpha)[[1]][,"Mean"], xlab ="obs", ylab ="fitted", main="site effect alpha") abline(a=0,b=1,col='red') ## Valpha coda::traceplot(mod$mcmc.V_alpha) coda::densplot(mod$mcmc.V_alpha) abline(v=V_alpha.target,col='red') ## Deviance summary(mod$mcmc.Deviance) plot(mod$mcmc.Deviance) #= Predictions ## Z par(mfrow=c(1,2)) plot(Z_true,mod$Z_latent, main="Z_latent", xlab="obs", ylab="fitted") abline(a=0,b=1,col='red') ## probit_theta plot(probit_theta,mod$probit_theta_latent, main="probit(theta)",xlab="obs",ylab="fitted") abline(a=0,b=1,col='red') ## probabilities theta par(mfrow=c(1,1)) plot(theta,mod$theta_latent, main="theta",xlab="obs",ylab="fitted") abline(a=0,b=1,col='red') par(oldpar)

References

Chib, S. and Greenberg, E. (1998) Analysis of multivariate probit models. Biometrika, 85, 347-361.

Warton, D. I.; Blanchet, F. G.; O'Hara, R. B.; O'Hara, R. B.; Ovaskainen, O.; Taskinen, S.; Walker, S. C. and Hui, F. K. C. (2015) So Many Variables: Joint Modeling in Community Ecology. Trends in Ecology & Evolution, 30, 766-779.

Ovaskainen, O., Tikhonov, G., Norberg, A., Blanchet, F. G., Duan, L., Dunson, D., Roslin, T. and Abrego, N. (2017) How to make more out of community data? A conceptual framework and its implementation as models and software. Ecology Letters, 20, 561-576.

See Also

plot.mcmc, summary.mcmc jSDM_binomial_logit jSDM_poisson_log

jSDM_binomial_probit_sp_constrained

Author(s)

Ghislain Vieilledent ghislain.vieilledent@cirad.fr

Jeanne Clément jeanne.clement16@laposte.net