Global Envelopes
The GET
package provides implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha)
the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot), for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional ANOVA, functional GLM, n-sample test of correspondence of distribution functions), and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction).
The GET
package provides central regions (i.e. global envelopes) and global envelope tests with intrinsic graphical interpretation. The central regions can be constructed from (functional) data. The tests are Monte Carlo or permutation tests, which demand simulations from the tested null model. The methods are applicable for any multivariate vector data and functional data (after discretization).
To get an overview of the package, start R and type library("GET")
and vignette("GET")
.
To get examples of point pattern analysis, start R and type library("GET")
and vignette("pointpatterns")
.
To get examples of Mrkvička and Myllymäki (2022), start R and type library("GET")
and vignette("FDRenvelopes")
.
GET
Central regions or global envelopes or confidence bands: central_region
. E.g. 50% central region of growth curves of girls growth
.
First create a curve_set of the growth curves, e.g.
cset \\<- curve_set(r = as.numeric(row.names(growth$hgtf)), obs =growth$hgtf)
Then calculate 50% central region (see central_region
for further arguments)
cr \\<- central_region(cset, coverage = 0.5)
Plot the result (see plot.global_envelope
for plotting options)
plot(cr)
It is also possible to do combined central regions for several sets of curves provided in a list for the function, see examples in central_region
.
Global envelope tests: global_envelope_test
is the main function. E.g. A test of complete spatial randomness (CSR) for a point pattern X
:
X \<- spruces # an example pattern from spatstat
Use the function envelope
of spatstat
to create nsim simulations under CSR and to calculate the functions you want (below K-functions by Kest). Important: use the option 'savefuns=TRUE' and specify the number of simulations nsim
.
env \\<- envelope(X, nsim=999, savefuns = TRUE, fun = Kest, simulate =expression(runifpoint(ex = X)))
Perform the test (see global_envelope_test
for further arguments)
res \\<- global_envelope_test(env)
Plot the result (see plot.global_envelope
for plotting options)
plot(res)
It is also possible to do combined global envelope tests for several sets of curves provided in a list for the function, see examples in global_envelope_test
. To obtain false discovery rate envelopes of Mrkvička and Myllymäki (2023) use the argument typeone = "fdr"
.
Functional ordering: central_region
and global_envelope_test
are based on different measures for ordering the functions (or vectors) from the most extreme to the least extreme ones. The core functionality of calculating the measures is in the function forder
, which can be used to obtain different measures for sets of curves. Usually there is no need to call forder
directly.
Functional boxplots: fBoxplot
Adjusted global envelope tests for composite null hypotheses
GET.composite
, see a detailed example in saplings
One-way functional ANOVA:
graph.fanova
frank.fanova
Functional general linear model (GLM):
graph.flm
frank.flm
partial_forder
Functional clustering: fclustering
Global quantile regression: global_rq
Functions for performing global envelopes for other specific purposes:
GET.distrequal
GET.distrindep
GET.spatialF
GET.localcor
GET.variogram
Deviation tests (for simple hypothesis): deviation_test
(no graphical interpretation)
Most functions accept the curves provided in a curve_set
object. Use curve_set
to create a curve_set
object from the functions. Other formats to provide the curves to the above functions are also accepted, see the information on the help pages.
See the help files of the functions for examples.
To perform a test you always first need to obtain the test function
for your data () and for each simulation () in one way or another. Given the set of the functions , you can perform a test by global_envelope_test
.
(Fit the model and) Create simulations from the (fitted) null model.
Calculate the functions .
Use curve_set
to create a curve_set
object from the functions .
Perform the test
res \<- global_envelope_test(curve_set)
where curve_set
is the 'curve_set'-object you created, and plot the result
plot(res)
spatstat
: start R, type library("GET")
and vignette("pointpatterns")
, which explains the workflow and gives many examples of point pattern analysisIt is possible to modify the curve set for the test.
crop_curves
.residual
. Here is the expectation of under the null hypothesis.abide_9002_23
: see help pageadult_trees
: a point pattern of adult reescgec
: centred government expenditure centralization (GEC) ratios (see graph.fanova
)fallen_trees
: a point pattern of fallen treesGDPtax
: GDP per capita with country groups and other covariatesimageset3
: a simulated set of imagesrimov
: water temperature curves in 365 days of the 36 yearssaplings
: a point pattern of saplings (see GET.composite
)The data sets are used to show examples of the functions of the library.
If the number of functions is low, the choice of the measure (or type or depth) playes a role, as explained in vignette("GET")
(Section 2.4).
Note that the recommended minimum number of simulations for the rank envelope test (Myllymäki et al., 2017) based on a single function in spatial statistics is nsim=2499. When the number of argument values is large, also larger number simulations is needed in order to have a narrow p-interval. The "erl", "cont", "area", "qdir" and "st" global envelope tests and deviation tests can be used with a lower number of simulations, although the Monte Carlo error is obviously larger with a lower number of simulations. For increasing the number of simulations, all the global rank envelopes approach the same curves.
Mrkvička et al. (2017) discussed the number of simulations for tests based on many functions.
Myllymäki and Mrkvička (2024) provides description of the package. The material can also be found in the corresponding vignette, which is available by starting R and typing library("GET")
and vignette("GET")
.
In the special case of spatial processes (spatial point processes, random sets), the functions are typically estimators of summary functions. The package supports the use of the R package spatstat
for generating simulations and calculating estimators of the chosen summary function, but alternatively these can be done by any other way, thus allowing for any user-specified models/functions. To see examples of global envelopes for analysing point pattern data, start R, type library("GET")
and vignette("pointpatterns")
.
Mrkvička and Myllymäki (2023) developed false discovery rate (FDR) envelopes. Examples can be found by in associated vignette: start R, and type library("GET")
and vignette("pointpatterns")
.
Mrkvička et al. (2023a) proposed global quantile regression. An example of global quantile regression is given in the vignette vignette("QuantileRegression")
.
The vignette vignette("HotSpots")
illustrates the methodology proposed by Mrkvička et al. (2023b) for detecting hotspots on a linear network.
Type citation("GET") to get a full list of references.
Mikko Kuronen has made substantial contributions of code. Additional contributions and suggestions from Jiří Dvořák, Pavel Grabarnik, Ute Hahn, Michael Rost and Henri Seijo.
Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. doi: 10.5772/intechopen.100124
Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671--699.
Konstantinou, K., Mrkvička, T. and Myllymäki, M. (2024) The power of visualizing distributional differences: formal graphical n-sample tests. Computational Statistics. doi: 10.1007/s00180-024-01569-z
Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2023a) Global quantile regression. arXiv:2309.04746 [stat.ME]. https://doi.org/10.48550/arXiv.2309.04746
Mrkvička, T., Kraft, S., Blažek, V. and Myllymäki, M. (2023b) Hotspots detection on a linear network with presence of covariates: a case study on road crash data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4598454
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5), 1239-1255. doi: 10.1007/s11222-016-9683-9
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Mrkvička, T., Myllymäki, M. (2023) False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
Mrkvička, T., Roskovec, T. and Rost, M. (2021) A nonparametric graphical tests of significance in functional GLM. Methodology and Computing in Applied Probability 23, 593-612. doi: 10.1007/s11009-019-09756-y
Mrkvička, T., Soubeyrand, S., Myllymäki, M., Grabarnik, P., and Hahn, U. (2016) Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Statistics 18, Part A, 40-53. doi: 10.1016/j.spasta.2016.04.005
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015) Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017) Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2024). GET: Global envelopes in R. Journal of Statistical Software 111(3), 1-40. doi: 10.18637/jss.v111.i03
Myllymäki, M., Kuronen, M. and Mrkvička, T. (2020). Testing global and local dependence of point patterns on covariates in parametric models. Spatial Statistics 42, 100436. doi: 10.1016/j.spasta.2020.100436
Mari Myllymäki (mari.myllymaki@luke.fi, mari.j.myllymaki@gmail.com) and Tomáš Mrkvička (mrkvicka.toma@gmail.com)