sampsd function

Sampling Simulated Data and Estimates of Multivariate Standard Errors

Sampling Simulated Data and Estimates of Multivariate Standard Errors

Each set of simulated data is sampled many times for each sampling effort, from 2 replicates to those defined as an argument in the function. Then, distance-based multivariate standard errors are estimated using pseudo-variance (for single site evaluation) or Mean Squares Estimates in a linear model (for multisite evaluation).

sampsd(dat.sim, Par, transformation, method, n, m, k)

Arguments

  • dat.sim: A list of data sets generated by simdata
  • Par: A list of parameters estimated by assempar
  • transformation: Mathematical function to reduce the weight of very dominant species: 'square root', 'fourth root', 'Log (X+1)', 'P/A', 'none'
  • method: The appropriate distance/dissimilarity metric (e.g. Gower, Bray–Curtis, Jaccard, etc). The function vegdist is called for that purpose.
  • n: Maximum number of samples to take at each site. Can be equal or less than N
  • m: Maximum number of sites to sample at each data set. Can be equal or less than sites
  • k: Number of repetitions of each sampling effort (samples and sites) for each data set

Details

If several virtual sites have been generated, subsets of sites of size 2 to m are sampled, followed by the selection of sampling units (from 2 to n) using inclusion probabilities and self-weighted two-stage sampling (Tille, 2006). Each combination of sampling effort (number of sample units and sites), are repeated several times (e.g. k = 100) for all simulated matrices. If simulated data correspond to a single site, sampling without replacement is performed several times (e.g. k = 100) for each sample size (from 2 to n) within each simulated matrix. This approach is computationally intensive, especially when k is high (> 10). Keep this in mind as it will affect the time to get results. For each sample, suitable pre-treatments are applied and distance/similarity matrices constructed using the appropriate coefficient. When simulations are done for a single site, the MultSE is estimated as (V/n)\sqrt(V/n), being V the pseudo variance measured at each sample of size n (Anderson & Santana-Garcon, 2015). When several sites were generated, MultSE are estimated using the residual mean squares and the sites mean squares from a PERMANOVA model (Anderson & Santana-Garcon, 2015).

Returns

  • mse.results: A matrix including all estimated MultSE for each simulated data, combination of sample replicates and sites for each k repetition. This matrix will be used by summary_ssp

References

Anderson, M.J. & Santana-Garcon, J. (2015) Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters, 18, 66-73

Guerra-Castro, E. J., J. C. Cajas, F. N. Dias Marques Simoes, J. J. Cruz-Motta, and M. Mascaro. (2020). SSP: An R package to estimate sampling effort in studies of ecological communities. bioRxiv:2020.2003.2019.996991.

Tillé, Y. (2006). Sampling algorithms. Springer, New York, NY.

Author(s)

Edlin Guerra-Castro (edlinguerra@gmail.com), Juan Carlos Cajas, Juan Jose Cruz-Motta, Nuno Simoes and Maite Mascaro (mmm@ciencias.unam.mx).

Note

For quick exploratory analyzes, keep the number of repetitions small. Once you have explored the behavior of the MultSE, you can repeat the process keeping k-values large (e.g. 100). This process will take some time and it will depend on the power of your computer.

See Also

assempar, simdata, summary_ssp, vegdist

Examples

###To speed up the simulation of these examples, the cases, sites and n were set small. ##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico) data(micromollusk) #Estimation of parameters of pilot data par.mic<-assempar (data = micromollusk, type= "P/A", Sest.method = "average") #Simulation of 3 data sets, each one with 20 potential sampling units from a single site sim.mic<-simdata(par.mic, cases = 3, N = 20, sites = 1) #Sampling and estimation of MultSE for each sample size (few repetitions to speed up the example) sam.mic<-sampsd(dat.sim = sim.mic, Par = par.mic, transformation = "P/A", method = "jaccard", n = 10, m = 1, k = 3) ##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico). data(sponges) #Estimation of parameters of pilot data par.spo<-assempar(data = sponges, type= "counts", Sest.method = "average") #Simulation of 3 data sets, each one with 20 potential sampling units in 3 sites. sim.spo<-simdata(par.spo, cases = 3, N = 20, sites = 3) #Sampling and estimation of MultSE for each sampling design (few #repetitions to speed up the example) sam.spo<-sampsd(dat.sim = sim.spo, Par = par.spo, transformation = "square root", method = "bray", n = 10, m = 3, k = 3)
  • Maintainer: Edlin Guerra-Castro
  • License: GPL-2
  • Last published: 2020-03-28