Implementation of several uniformity tests on the (hyper)sphere c("\n", "Sp−1:=xinRp:∣∣x∣∣=1"), p≥2, with calibration either in terms of their asymptotic/exact distributions, if available, or Monte Carlo.
unif_test receives a sample of directions X1,…,Xn∈Sp−1 in Cartesian coordinates, except for the circular case (p=2) in which the sample can be represented in terms of angles
Θ1,…,Θn∈[0,2π).
unif_test allows to perform several tests within a single call, facilitating thus the exploration of a dataset by applying several tests.
data: sample to perform the test. A matrix of size c(n, p)
containing a sample of size n of directions (in Cartesian coordinates) on Sp−1. Alternatively if p = 2, a matrix of size c(n, 1) containing the n angles on [0,2π) of the circular sample on S1. Other objects accepted are an array of size c(n, p, 1) with directions (in Cartesian coordinates), or a vector of size n or an array of size c(n, 1, 1) with angular data. Must not contain NA's.
type: type of test to be applied. A character vector containing any of the following types of tests, depending on the dimension p:
Circular data: any of the names available at object avail_cir_tests.
(Hyper)spherical data: any of the names available at object avail_sph_tests.
If type = "all" (default), then type is set as avail_cir_tests or avail_sph_tests, depending on the value of p.
p_value: type of p-value computation. Either "MC" for employing the approximation by Monte Carlo of the exact null distribution, "asymp" (default) for the use of the asymptotic/exact null distribution (if available), or "crit_val" for approximation by means of the table of critical values crit_val.
alpha: vector with significance levels. Defaults to c(0.10, 0.05, 0.01).
M: number of Monte Carlo replications for approximating the null distribution when approx = "MC". Also, number of Monte Carlo samples for approximating the asymptotic distributions based on weighted sums of chi squared random variables. Defaults to 1e4.
stats_MC: a data frame of size c(M, length(type)), with column names containing the character vector type, that results from extracting $stats_MC from a call to unif_stat_MC. If provided, the computation of Monte Carlo statistics when approx = "MC"
is skipped. stats_MC is checked internally to see if it is sorted. Internally computed if NULL (default).
crit_val: table with critical values for the tests, to be used if p_value = "crit_val". A data frame, with column names containing the character vector type and rows corresponding to the significance levels alpha, that results from extracting $crit_val_MC from a call to unif_stat_MC. Internally computed if NULL (default).
data_sorted: is the circular data sorted? If TRUE, certain statistics are faster to compute. Defaults to FALSE.
K_max: integer giving the truncation of the series that compute the asymptotic p-value of a Sobolev test. Defaults to 1e4.
method: method for approximating the density, distribution, or quantile function of the weighted sum of chi squared random variables. Must be "I" (Imhof), "SW" (Satterthwaite--Welch), "HBE"
(Hall--Buckley--Eagleson), or "MC" (Monte Carlo; only for distribution or quantile functions). Defaults to "I".
CCF09_dirs: a matrix of size c(n_proj, p) containing n_proj random directions (in Cartesian coordinates) on Sp−1
to perform the CCF09 test. If NULL (default), a sample of size n_proj = 50 directions is computed internally.
CJ12_beta: β parameter in the exponential regime of CJ12 test, a positive real.
CJ12_reg: type of asymptotic regime for CJ12 test, either 1
(sub-exponential regime), 2 (exponential), or 3
(super-exponential; default).
cov_a: an=a/n parameter used in the length of the arcs of the coverage-based tests. Must be positive. Defaults to 2 * pi.
Cressie_t: t parameter for the Cressie test, a real in (0,1). Defaults to 1 / 3.
K_CCF09: integer giving the truncation of the series present in the asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to 25.
Poisson_rho: ρ parameter for the Poisson test, a real in [0,1). Defaults to 0.5.
Pycke_q: q parameter for the Pycke "q-test", a real in (0,1). Defaults to 1 / 2.
Rayleigh_m: integer m for the m-modal Rayleigh test. Defaults to m = 1 (the standard Rayleigh test).
Riesz_s: s parameter for the s-Riesz test, a real in (0,2). Defaults to 1.
Rothman_t: t parameter for the Rothman test, a real in (0,1). Defaults to 1 / 3.
Sobolev_vk2: weights for the finite Sobolev test. A non-negative vector or matrix. Defaults to c(0, 0, 1).
Softmax_kappa: κ parameter for the Softmax test, a non-negative real. Defaults to 1.
Stereo_a: a parameter for the Stereo test, a real in [−1,1]. Defaults to 0.
...: If p_value = "MC" or p_value = "crit_val", optional performance parameters to be passed to unif_stat_MC: chunks, cores, and seed. If p_value = "MC", additional parameters to unif_stat_distr.
Returns
If only a single test is performed, a list with class htest containing the following components:
statistic: the value of the test statistic.
p.value: the p-value of the test. If p_value = "crit_val", an NA.
alternative: a character string describing the alternative hypothesis.
method: a character string indicating what type of test was performed.
data.name: a character string giving the name of the data.
reject: the rejection decision for the levels of significance alpha.
crit_val: a vector with the critical values for the significance levels alpha used with p_value = "MC" or p_value = "asymp".
param: parameter(s) used in the test (if any).
If several tests are performed, a type-named list with entries for each test given by the above list.
Details
All the tests reject for large values of the test statistic, so the critical values for the significance levels alpha correspond to the alpha-upper quantiles of the null distribution of the test statistic.
When p_value = "asymp", tests that do not have an implemented or known asymptotic are omitted, and a warning is generated.
When p_value = "MC", it is possible to have a progress bar indicating the Monte Carlo simulation progress if unif_test is wrapped with progressr::with_progress or if progressr::handlers(global = TRUE) is invoked (once) by the user. See the examples below. The progress bar is updated with the number of finished chunks.
All the statistics are continuous random variables except the Hodges--Ajne statistic ("Hodges_Ajne"), the Cressie statistic ("Cressie"), and the number of (different) uncovered spacings ("Num_uncover"). These three statistics are discrete random variables.
The Monte Carlo calibration for the CCF09 test is made conditionally on the choice of
CCF09_dirs. That is, all the Monte Carlo statistics share the same random directions.
Except for CCF09_dirs, K_CCF09, and CJ12_reg, all the test-specific parameters are vectorized.
Descriptions and references for most of the tests are available in García-Portugués and Verdebout (2018).
Examples
## Asymptotic distribution# Circular datan <-10samp_cir <- r_unif_cir(n = n)# Matrixunif_test(data = samp_cir, type ="Ajne", p_value ="asymp")# Vectorunif_test(data = samp_cir[,1], type ="Ajne", p_value ="asymp")# Arrayunif_test(data = array(samp_cir, dim = c(n,1,1)), type ="Ajne", p_value ="asymp")# Several testsunif_test(data = samp_cir, type = avail_cir_tests, p_value ="asymp")# Spherical datan <-10samp_sph <- r_unif_sph(n = n, p =3)# Arrayunif_test(data = samp_sph, type ="Bingham", p_value ="asymp")# Matrixunif_test(data = samp_sph[,,1], type ="Bingham", p_value ="asymp")# Several testsunif_test(data = samp_sph, type = avail_sph_tests, p_value ="asymp")## Monte Carlo# Circular dataunif_test(data = samp_cir, type ="Ajne", p_value ="MC")unif_test(data = samp_cir, type = avail_cir_tests, p_value ="MC")# Spherical dataunif_test(data = samp_sph, type ="Bingham", p_value ="MC")unif_test(data = samp_sph, type = avail_sph_tests, p_value ="MC")# Caching stats_MCstats_MC_cir <- unif_stat_MC(n = nrow(samp_cir), p =2)$stats_MC
stats_MC_sph <- unif_stat_MC(n = nrow(samp_sph), p =3)$stats_MC
unif_test(data = samp_cir, type = avail_cir_tests, p_value ="MC", stats_MC = stats_MC_cir)unif_test(data = samp_sph, type = avail_sph_tests, p_value ="MC", stats_MC = stats_MC_sph)## Critical values# Circular dataunif_test(data = samp_cir, type = avail_cir_tests, p_value ="crit_val")# Spherical dataunif_test(data = samp_sph, type = avail_sph_tests, p_value ="crit_val")# Caching crit_valcrit_val_cir <- unif_stat_MC(n = n, p =2)$crit_val_MC
crit_val_sph <- unif_stat_MC(n = n, p =3)$crit_val_MC
unif_test(data = samp_cir, type = avail_cir_tests, p_value ="crit_val", crit_val = crit_val_cir)unif_test(data = samp_sph, type = avail_sph_tests, p_value ="crit_val", crit_val = crit_val_sph)## Specific arguments# Rothmanunif_test(data = samp_cir, type ="Rothman", Rothman_t =0.5)# CCF09unif_test(data = samp_sph, type ="CCF09", p_value ="MC", CCF09_dirs = samp_sph[1:2,,1])unif_test(data = samp_sph, type ="CCF09", p_value ="MC", CCF09_dirs = samp_sph[3:4,,1])## Using a progress bar when p_value = "MC"# Define a progress barrequire(progress)require(progressr)handlers(handler_progress( format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:",":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"), clear =FALSE))# Call unif_test() within with_progress()with_progress( unif_test(data = samp_sph, type = avail_sph_tests, p_value ="MC", chunks =10, M =1e3))# With several coreswith_progress( unif_test(data = samp_sph, type = avail_sph_tests, p_value ="MC", cores =2, chunks =10, M =1e3))# Instead of using with_progress() each time, it is more practical to run# handlers(global = TRUE)# once to activate progress bars in your R session
References
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. tools:::Rd_expr_doi("10.48550/arXiv.1804.00286") .