This function determines the probability of having bounding effects in a scatter plot of of x and y based on the clustering of points at the upper edge of the scatter plot (Miti et al.2024). It tests the hypothesis of larger clustering at the upper bounds of a scatter plot against a null bivariate normal distribution with no bounding effect (random scatter at upper edges). It returns the probability (p-value) of the observed clustering given that it a realization of an unbounded bivariate normal distribution.
expl_boundary(x, y, shells =10, simulations =1000, plot =TRUE,...)
Arguments
x: A numeric vector of values for the independent variable.
y: A numeric vector of values for the response variable.
shells: A numeric value indicating the number of boundary peels (default is 10).
simulations: The number of simulations for the null bivariate normally distributed data sets used to test the hypothesis (default is 1000).
plot: If TRUE, a plot is part of the output. If FALSE, plot is not part of output (default is TRUE).
...: Additional graphical parameters as with the par() function.
Returns
A dataframe with the p-values of obtaining the observed standard deviation of the euclidean distances of vertices in the upper peels to the center of the dataset for the left and right sections of the dataset.
Details
It is recommended that any outlying observations, as identified by the bagplot() function of the aplpack package are removed from the data. This is also implemented in the simulation step in the expl_boundary() function.
Examples
x<-evapotranspiration$`ET(mm)`
y<-evapotranspiration$`yield(t/ha)`
expl_boundary(x,y,10,100)# recommendation is to set simulations to greater than 1000
References
Eddy, W. F. (1982). Convex hull peeling, COMPSTAT 1982-Part I: Proceedings in Computational Statistics, 42-47. Physica-Verlag, Vienna.
Miti. c., Milne. A. E., Giller. K. E. and Lark. R. M (2024). Exploration of data for analysis using boundary line methodology. Computers and Electronics in Agriculture 219 (2024) 108794.