Sampling Simulated Data and Estimates of Multivariate Standard Errors
Sampling Simulated Data and Estimates of Multivariate Standard Errors
Each set of simulated data is sampled many times for each sampling effort, from 2 replicates to those defined as an argument in the function. Then, distance-based multivariate standard errors are estimated using pseudo-variance (for single site evaluation) or Mean Squares Estimates in a linear model (for multisite evaluation).
sampsd(dat.sim, Par, transformation, method, n, m, k)
Arguments
dat.sim: A list of data sets generated by simdata
Par: A list of parameters estimated by assempar
transformation: Mathematical function to reduce the weight of very dominant species: 'square root', 'fourth root', 'Log (X+1)', 'P/A', 'none'
method: The appropriate distance/dissimilarity metric (e.g. Gower, Bray–Curtis, Jaccard, etc). The function vegdist is called for that purpose.
n: Maximum number of samples to take at each site. Can be equal or less than N
m: Maximum number of sites to sample at each data set. Can be equal or less than sites
k: Number of repetitions of each sampling effort (samples and sites) for each data set
Details
If several virtual sites have been generated, subsets of sites of size 2 to m are sampled, followed by the selection of sampling units (from 2 to n) using inclusion probabilities and self-weighted two-stage sampling (Tille, 2006). Each combination of sampling effort (number of sample units and sites), are repeated several times (e.g. k = 100) for all simulated matrices. If simulated data correspond to a single site, sampling without replacement is performed several times (e.g. k = 100) for each sample size (from 2 to n) within each simulated matrix. This approach is computationally intensive, especially when k is high (> 10). Keep this in mind as it will affect the time to get results. For each sample, suitable pre-treatments are applied and distance/similarity matrices constructed using the appropriate coefficient. When simulations are done for a single site, the MultSE is estimated as (V/n), being V the pseudo variance measured at each sample of size n (Anderson & Santana-Garcon, 2015). When several sites were generated, MultSE are estimated using the residual mean squares and the sites mean squares from a PERMANOVA model (Anderson & Santana-Garcon, 2015).
Returns
mse.results: A matrix including all estimated MultSE for each simulated data, combination of sample replicates and sites for each k repetition. This matrix will be used by summary_ssp
References
Anderson, M.J. & Santana-Garcon, J. (2015) Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters, 18, 66-73
Guerra-Castro, E. J., J. C. Cajas, F. N. Dias Marques Simoes, J. J. Cruz-Motta, and M. Mascaro. (2020). SSP: An R package to estimate sampling effort in studies of ecological communities. bioRxiv:2020.2003.2019.996991.
Tillé, Y. (2006). Sampling algorithms. Springer, New York, NY.
For quick exploratory analyzes, keep the number of repetitions small. Once you have explored the behavior of the MultSE, you can repeat the process keeping k-values large (e.g. 100). This process will take some time and it will depend on the power of your computer.
See Also
assempar, simdata, summary_ssp, vegdist
Examples
###To speed up the simulation of these examples, the cases, sites and n were set small.##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)data(micromollusk)#Estimation of parameters of pilot datapar.mic<-assempar (data = micromollusk, type="P/A", Sest.method ="average")#Simulation of 3 data sets, each one with 20 potential sampling units from a single sitesim.mic<-simdata(par.mic, cases =3, N =20, sites =1)#Sampling and estimation of MultSE for each sample size (few repetitions to speed up the example)sam.mic<-sampsd(dat.sim = sim.mic, Par = par.mic, transformation ="P/A", method ="jaccard", n =10, m =1, k =3)##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico).data(sponges)#Estimation of parameters of pilot datapar.spo<-assempar(data = sponges, type="counts", Sest.method ="average")#Simulation of 3 data sets, each one with 20 potential sampling units in 3 sites.sim.spo<-simdata(par.spo, cases =3, N =20, sites =3)#Sampling and estimation of MultSE for each sampling design (few#repetitions to speed up the example)sam.spo<-sampsd(dat.sim = sim.spo, Par = par.spo, transformation ="square root", method ="bray", n =10, m =3, k =3)