GiniMd function

Gini's Mean Difference

Gini's Mean Difference

GiniMD computes Gini's mean difference on a numeric vector. This index is defined as the mean absolute difference between any two distinct elements of a vector. For a Bernoulli (binary) variable with proportion of ones equal to pp and sample size nn, Gini's mean difference is 2np(1p)/(n1)2np(1-p)/(n-1). For a trinomial variable (e.g., predicted values for a 3-level categorical predictor using two dummy variables) having (predicted) values A,B,CA, B, C with corresponding proportions a,b,ca, b, c, Gini's mean difference is 2n[abAB+acAC+bcBC]/(n1).2n[ab|A-B|+ac|A-C|+bc|B-C|]/(n-1).

GiniMd(x, na.rm=FALSE)

Arguments

  • x: a numeric vector (for GiniMd)
  • na.rm: set to TRUE if you suspect there may be NAs in x; these will then be removed. Otherwise an error will result.

Returns

a scalar numeric

References

David HA (1968): Gini's mean difference rediscovered. Biometrika 55:573--575.

Author(s)

Frank Harrell

Department of Biostatistics

Vanderbilt University

fh@fharrell.com

Examples

set.seed(1) x <- rnorm(40) # Test GiniMd against a brute-force solution gmd <- function(x) { n <- length(x) sum(outer(x, x, function(a, b) abs(a - b))) / n / (n - 1) } GiniMd(x) gmd(x) z <- c(rep(0,17), rep(1,6)) n <- length(z) GiniMd(z) 2*mean(z)*(1-mean(z))*n/(n-1) a <- 12; b <- 13; c <- 7; n <- a + b + c A <- -.123; B <- -.707; C <- 0.523 xx <- c(rep(A, a), rep(B, b), rep(C, c)) GiniMd(xx) 2*(a*b*abs(A-B) + a*c*abs(A-C) + b*c*abs(B-C))/n/(n-1)