Computes mutual information based on the distribution of nearest neighborhood distances. Method available are KSG1 and KSG2 as described by Kraskov, et. al (2004) and the Local Non-Uniformity Corrected (LNC) KSG as described by Gao, et. al (2015). The LNC method is based on KSG2 but with PCA volume corrections to adjust for observed non-uniformity of the local neighborhood of each point in the sample.
knn_mi(data, splits, options)
Arguments
data: Matrix of sample observations, each row is an observation.
splits: A vector that describes which sets of columns in data to compute the mutual information between. For example, to compute mutual information between two variables use splits = c(1,1). To compute redundancy among multiple random variables use splits = rep(1,ncol(data)). To compute the mutual information between two random vector list the dimensions of each vector.
options: A list that specifies the estimator and its necessary parameters (see details).
Details
Current available methods are LNC, KSG1 and KSG2.
For KSG1 use: options = list(method = "KSG1", k = 5)
For KSG2 use: options = list(method = "KSG2", k = 5)
For LNC use: options = list(method = "LNC", k = 10, alpha = 0.65), order needed k > ncol(data).
Gao, S., Ver Steeg G., & Galstyan A. (2015). Efficient estimation of mutual information for strongly dependent variables. Artificial Intelligence and Statistics: 277-286.
Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical review E 69(6): 066138.
Examples
set.seed(123)x <- rnorm(1000)y <- x + rnorm(1000)knn_mi(cbind(x,y),c(1,1),options = list(method ="KSG2", k =6))set.seed(123)x <- rnorm(1000)y <-100*x + rnorm(1000)knn_mi(cbind(x,y),c(1,1),options = list(method ="LNC", alpha =0.65, k =10))#approximate analytic value of mutual information-0.5*log(1-cor(x,y)^2)z <- rnorm(1000)#redundancy I(x;y;z) is approximately the same as I(x;y)knn_mi(cbind(x,y,z),c(1,1,1),options = list(method ="LNC", alpha = c(0.5,0,0,0), k =10))#mutual information I((x,y);z) is approximately 0knn_mi(cbind(x,y,z),c(2,1),options = list(method ="LNC", alpha = c(0.5,0.65,0), k =10))