somers2 function

Somers' Dxy Rank Correlation

Somers' Dxy Rank Correlation

Computes Somers' Dxy rank correlation between a variable x and a binary (0-1) variable y, and the corresponding receiver operating characteristic curve area c. Note that Dxy = 2(c-0.5). somers allows for a weights variable, which specifies frequencies to associate with each observation.

somers2(x, y, weights=NULL, normwt=FALSE, na.rm=TRUE)

Arguments

  • x: typically a predictor variable. NAs are allowed.
  • y: a numeric outcome variable coded 0-1. NAs are allowed.
  • weights: a numeric vector of observation weights (usually frequencies). Omit or specify a zero-length vector to do an unweighted analysis.
  • normwt: set to TRUE to make weights sum to the actual number of non-missing observations.
  • na.rm: set to FALSE to suppress checking for NAs.

Returns

a vector with the named elements C, Dxy, n (number of non-missing pairs), and Missing. Uses the formula C = (mean(rank(x)[y == 1]) - (n1 + 1)/2)/(n - n1), where n1 is the frequency of y=1.

Details

The rcorr.cens function, which although slower than somers2 for large sample sizes, can also be used to obtain Dxy for non-censored binary y, and it has the advantage of computing the standard deviation of the correlation index.

Author(s)

Frank Harrell

Department of Biostatistics

Vanderbilt University School of Medicine

fh@fharrell.com

See Also

concordance, rcorr.cens, rank, wtd.rank,

Examples

set.seed(1) predicted <- runif(200) dead <- sample(0:1, 200, TRUE) roc.area <- somers2(predicted, dead)["C"] # Check weights x <- 1:6 y <- c(0,0,1,0,1,1) f <- c(3,2,2,3,2,1) somers2(x, y) somers2(rep(x, f), rep(y, f)) somers2(x, y, f)