sampsd() R function from [SSP]

Sampling Simulated Data and Estimates of Multivariate Standard Errors

Each set of simulated data is sampled many times for each sampling effort, from 2 replicates to those defined as an argument in the function. Then, distance-based multivariate standard errors are estimated using pseudo-variance (for single site evaluation) or Mean Squares Estimates in a linear model (for multisite evaluation).


sampsd(dat.sim, Par, transformation, method, n, m, k)

Arguments

dat.sim: A list of data sets generated by simdata
Par: A list of parameters estimated by assempar
transformation: Mathematical function to reduce the weight of very dominant species: 'square root', 'fourth root', 'Log (X+1)', 'P/A', 'none'
method: The appropriate distance/dissimilarity metric (e.g. Gower, Bray–Curtis, Jaccard, etc). The function vegdist is called for that purpose.
n: Maximum number of samples to take at each site. Can be equal or less than N
m: Maximum number of sites to sample at each data set. Can be equal or less than sites
k: Number of repetitions of each sampling effort (samples and sites) for each data set

Details

If several virtual sites have been generated, subsets of sites of size 2 to m are sampled, followed by the selection of sampling units (from 2 to n) using inclusion probabilities and self-weighted two-stage sampling (Tille, 2006). Each combination of sampling effort (number of sample units and sites), are repeated several times (e.g. k = 100) for all simulated matrices. If simulated data correspond to a single site, sampling without replacement is performed several times (e.g. k = 100) for each sample size (from 2 to n) within each simulated matrix. This approach is computationally intensive, especially when k is high (> 10). Keep this in mind as it will affect the time to get results. For each sample, suitable pre-treatments are applied and distance/similarity matrices constructed using the appropriate coefficient. When simulations are done for a single site, the MultSE is estimated as $\sqrt(V/n)$ , being V the pseudo variance measured at each sample of size n (Anderson & Santana-Garcon, 2015). When several sites were generated, MultSE are estimated using the residual mean squares and the sites mean squares from a PERMANOVA model (Anderson & Santana-Garcon, 2015).

Returns

mse.results: A matrix including all estimated MultSE for each simulated data, combination of sample replicates and sites for each k repetition. This matrix will be used by summary_ssp

References

Anderson, M.J. & Santana-Garcon, J. (2015) Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters, 18, 66-73

Guerra-Castro, E. J., J. C. Cajas, F. N. Dias Marques Simoes, J. J. Cruz-Motta, and M. Mascaro. (2020). SSP: An R package to estimate sampling effort in studies of ecological communities. bioRxiv:2020.2003.2019.996991.

Tillé, Y. (2006). Sampling algorithms. Springer, New York, NY.

Author(s)

Edlin Guerra-Castro (edlinguerra@gmail.com), Juan Carlos Cajas, Juan Jose Cruz-Motta, Nuno Simoes and Maite Mascaro (mmm@ciencias.unam.mx).

Note

For quick exploratory analyzes, keep the number of repetitions small. Once you have explored the behavior of the MultSE, you can repeat the process keeping k-values large (e.g. 100). This process will take some time and it will depend on the power of your computer.

Examples


###To speed up the simulation of these examples, the cases, sites and n were set small.

##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)

#Estimation of parameters of pilot data
par.mic<-assempar (data = micromollusk,
                    type= "P/A",
                    Sest.method = "average")

#Simulation of 3 data sets, each one with 20 potential sampling units from a single site
sim.mic<-simdata(par.mic, cases = 3, N = 20, sites = 1)

#Sampling and estimation of MultSE for each sample size (few repetitions to speed up the example)
sam.mic<-sampsd(dat.sim = sim.mic,
               Par = par.mic,
               transformation = "P/A",
               method = "jaccard",
               n = 10,
               m = 1,
               k = 3)

##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico).
data(sponges)

#Estimation of parameters of pilot data
par.spo<-assempar(data = sponges,
                  type= "counts",
                  Sest.method = "average")

#Simulation of 3 data sets, each one with 20 potential sampling units in 3 sites.
sim.spo<-simdata(par.spo, cases = 3, N = 20, sites = 3)

#Sampling and estimation of MultSE for each sampling design (few
#repetitions to speed up the example)

sam.spo<-sampsd(dat.sim = sim.spo,
                Par = par.spo,
                transformation = "square root",
                method = "bray",
                n = 10,
                m = 3,
                k = 3)

SSP package Read PDF manual

Maintainer: Edlin Guerra-Castro
License: GPL-2
Last published: 2020-03-28

Useful links

sampsd function