The function simulates data sets (as many as requested) using estimated parameters from the list generated by assempar. The function returns an object of class list that includes all the simulated data to be used by datquality and sampsd.
simdata(Par, cases, N, sites)
Arguments
Par: A list of parameters estimated by assempar
cases: Number of data sets to be simulated
N: Total number of samples to be simulated in each site
sites: Total number of sites to be simulated in each data set
Details
The presence/absence of each species at each site are simulated with Bernoulli trials and probability of success equals to the empirical frequency of occurrence of each species among sites in the pilot data. For sites with the presence of a particular species, Bernoulli trials are used (with a probability of success equal to the estimated empirical frequency within the sites where it appears), to simulate the distribution of the species at that site. Once created, the P/A matrices are converted to matrices of abundances replacing presences by random values from an adequate statistical distribution and parameters equal to those estimated in the pilot data. Simulations of counts of individuals are generated using Poisson or negative binomial distributions, depending on the degree of aggregation of each species in the pilot data (McArdle & Anderson 2004; Anderson & Walsh 2013). Simulations of continuous variables (i.e. coverage, biomass), are generated using the log-normal distribution. The simulation procedure is repeated to generate as many simulated data matrices as needed.
Returns
simulated.data: The function returns an object of class List, that includes all simulated data. This object will be used by sampsd and datquality
References
Anderson, M. J., & Walsh, D. C. I. (2013). PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecological Monographs, 83(4), 557-574.
Anderson, M. J., P. de Valpine, A. Punnett, & Miller, A. E. (2019). A pathway for multivariate analysis of ecological communities using copulas. Ecology and Evolution 9:3276-3294.
Guerra-Castro, E. J., J. C. Cajas, F. N. Dias Marques Simoes, J. J. Cruz-Motta, and M. Mascaro. (2020). SSP: An R package to estimate sampling effort in studies of ecological communities. bioRxiv:2020.2003.2019.996991.
McArdle, B. H., & Anderson, M. J. (2004). Variance heterogeneity, transformations, and models of species abundance: a cautionary tale. Canadian Journal of Fisheries and Aquatic Sciences, 61, 1294-1302.
This approach is not free from assumptions. Simulations do not consider any environmental constraint, neither co-occurrence structure of species. It is assumed that potential differences in species composition/abundance among samples and sites are mainly due to spatial aggregation of species, as estimated from the pilot data. Hence, any ecological property of the assemblage that was not captured by the pilot data, will not be reflected in the simulated data. Associations among species can be modeled using copulas, as suggested by Anderson et al (2019), which could be included in an upcoming version of SSP.
See Also
sampsd, datquality
Examples
###To speed up the simulation of these examples, the cases, sites and N were set small.##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)data(micromollusk)#Estimation of parameters of pilot datapar.mic<-assempar(data = micromollusk, type="P/A", Sest.method ="average")#Simulation of 3 data sets, each one with 10 potential sampling units from a single sitesim.mic<-simdata(par.mic, cases =3, N =10, sites =1)##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico).data(sponges)#Estimation of parameters of pilot datapar.spo<-assempar (data = sponges, type="counts", Sest.method ="average")#Simulation of 3 data sets, each one with 10 potential sampling units in 3 sites.sim.spo<-simdata(par.spo, cases =3, N =10, sites =3)