Computes a matrix of Hoeffding's (1948) D statistics for all possible pairs of columns of a matrix. D is a measure of the distance between F(x,y) and G(x)H(y), where F(x,y)
is the joint CDF of X and Y, and G and H are marginal CDFs. Missing values are deleted in pairs rather than deleting all rows of x having any missing variables. The D
statistic is robust against a wide variety of alternatives to independence, such as non-monotonic relationships. The larger the value of D, the more dependent are X and Y (for many types of dependencies). D used here is 30 times Hoeffding's original D, and ranges from -0.5 to 1.0 if there are no ties in the data. print.hoeffd prints the information derived by hoeffd. The higher the value of D, the more dependent are x and y. hoeffd also computes the mean and maximum absolute values of the difference between the joint empirical CDF and the product of the marginal empirical CDFs.
hoeffd(x, y)## S3 method for class 'hoeffd'print(x,...)
Arguments
x: a numeric matrix with at least 5 rows and at least 2 columns (if y is absent), or an object created by hoeffd
y: a numeric vector or matrix which will be concatenated to x
...: ignored
Returns
a list with elements D, the matrix of D statistics, n the matrix of number of observations used in analyzing each pair of variables, and P, the asymptotic P-values. Pairs with fewer than 5 non-missing values have the D statistic set to NA. The diagonals of n are the number of non-NAs for the single variable corresponding to that row and column.
Details
Uses midranks in case of ties, as described by Hollander and Wolfe. P-values are approximated by linear interpolation on the table in Hollander and Wolfe, which uses the asymptotically equivalent Blum-Kiefer-Rosenblatt statistic. For P<.0001 or >0.5, P values are computed using a well-fitting linear regression function in log P vs. the test statistic. Ranks (but not bivariate ranks) are computed using efficient algorithms (see reference 3).
Hoeffding W. (1948): A non-parametric test of independence. Ann Math Stat 19:546--57.
Hollander M. and Wolfe D.A. (1973). Nonparametric Statistical Methods, pp. 228--235, 423. New York: Wiley.
Press WH, Flannery BP, Teukolsky SA, Vetterling, WT (1988): Numerical Recipes in C. Cambridge: Cambridge University Press.
See Also
rcorr, varclus
Examples
x <- c(-2,-1,0,1,2)y <- c(4,1,0,1,4)z <- c(1,2,3,4,NA)q <- c(1,2,3,4,5)hoeffd(cbind(x,y,z,q))# Hoeffding's test can detect even one-to-many dependencyset.seed(1)x <- seq(-10,10,length=200)y <- x*sign(runif(200,-1,1))plot(x,y)hoeffd(x,y)