polychor function

Polychoric Correlation

Polychoric Correlation

Computes the polychoric correlation (and its standard error) between two ordinal variables or from their contingency table, under the assumption that the ordinal variables dissect continuous latent variables that are bivariate normal. Either the maximum-likelihood estimator or a (possibly much) quicker ``two-step'' approximation is available. For the ML estimator, the estimates of the thresholds and the covariance matrix of the estimates are also available.

polychor(x, y, ML = FALSE, control = list(), std.err = FALSE, maxcor=.9999, start, thresholds=FALSE)

Arguments

  • x: a contingency table of counts or an ordered categorical variable; the latter can be numeric, logical, a factor, an ordered factor, or a character variable, but if a factor, its levels should be in proper order, and the values of a character variable are ordered alphabetically.
  • y: if x is a variable, a second ordered categorical variable.
  • ML: if TRUE, compute the maximum-likelihood estimate; if FALSE, the default, compute a quicker ``two-step'' approximation.
  • control: optional arguments to be passed to the optim function.
  • std.err: if TRUE, return the estimated variance of the correlation (for the two-step estimator) or the estimated covariance matrix (for the ML estimator) of the correlation and thresholds; the default is FALSE.
  • maxcor: maximum absolute correlation (to insure numerical stability).
  • start: optional start value(s): if a single number, start value for the correlation; if a list with the elements rho, row.thresholds, and column.thresholds, start values for these parameters; start values are supplied automatically if omitted, and are only relevant when the ML estimator or standard errors are selected.
  • thresholds: if TRUE (the default is FALSE) return estimated thresholds along with the estimated correlation even if standard errors aren't computed.

Returns

If std.err or thresholds is TRUE, returns an object of class "polycor" with the following components: - type: set to "polychoric".

  • rho: the polychoric correlation.

  • row.cuts: estimated thresholds for the row variable (x), for the ML estimate.

  • col.cuts: estimated thresholds for the column variable (y), for the ML estimate.

  • var: the estimated variance of the correlation, or, for the ML estimate, the estimated covariance matrix of the correlation and thresholds.

  • n: the number of observations on which the correlation is based.

  • chisq: chi-square test for bivariate normality.

  • df: degrees of freedom for the test of bivariate normality.

  • ML: TRUE for the ML estimate, FALSE for the two-step estimate.

Othewise, returns the polychoric correlation.

Details

The ML estimator is computed by maximizing the bivariate-normal likelihood with respect to the thresholds for the two variables (τx[i],i=1,,r1\tau^x[i], i = 1,\ldots, r - 1; τy[j],j=1,,c1\tau^y[j], j = 1,\ldots, c - 1) and the population correlation (ρ\rho). Here, rr and cc are respectively the number of levels of xx and yy. The likelihood is maximized numerically using the optim function, and the covariance matrix of the estimated parameters is based on the numerical Hessian computed by optim.

The two-step estimator is computed by first estimating the thresholds (τx[i],i=1,,r1\tau^x[i], i = 1,\ldots, r - 1

and τy[j],i=j,,c1\tau^y[j], i = j,\ldots, c - 1) separately from the marginal distribution of each variable. Then the one-dimensional likelihood for ρ\rho is maximized numerically, using optim if standard errors are requested, or optimise if they are not. The standard error computed treats the thresholds as fixed.

References

Drasgow, F. (1986) Polychoric and polyserial correlations. Pp. 68--74 in S. Kotz and N. Johnson, eds., The Encyclopedia of Statistics, Volume 7. Wiley.

Olsson, U. (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44 , 443-460.

Author(s)

John Fox jfox@mcmaster.ca

See Also

hetcor, polyserial, print.polycor, optim

Examples

if(require(mvtnorm)){ set.seed(12345) data <- rmvnorm(1000, c(0, 0), matrix(c(1, .5, .5, 1), 2, 2)) x <- data[,1] y <- data[,2] cor(x, y) # sample correlation } if(require(mvtnorm)){ x <- cut(x, c(-Inf, .75, Inf)) y <- cut(y, c(-Inf, -1, .5, 1.5, Inf)) polychor(x, y) # 2-step estimate } if(require(mvtnorm)){ polychor(x, y, ML=TRUE, std.err=TRUE) # ML estimate }