modelStatistics function

Calculate a range of goodness of fit measures for an object fitted with some multivariate statistical method that yields probability estimates for outcomes.

Calculate a range of goodness of fit measures for an object fitted with some multivariate statistical method that yields probability estimates for outcomes.

modelStatistics calculates a range of goodness of fit measures.

modelStatistics(observed, predicted, frequency=NA, p.values, n.data, n.predictors, outcomes=levels(as.factor(observed)), p.normalize=TRUE, cross.tabulation=TRUE, p.zero.correction=1/(NROW(p.values)*NCOL(p.values))^2)

Arguments

  • observed: observed values of the response variable
  • predicted: predicted values of the response variable; typically the outcome estimated to have the highest probability
  • frequency: frequencies of observed and predicted values; if NA, frequencies equal to 1 for all observed and predicted values
  • p.values: matrix of probabilities for all values of the response variable (i.e outcomes)
  • n.data: sum frequency of data points in model
  • n.predictors: number of predictor levels in model
  • outcomes: a vector with the possible values of the response variable
  • p.normalize: if TRUE, probabilities are normalized so that sum(P) of all outcomes for each datapoint is equal to 1
  • cross.tabulation: if TRUE, statistics on the crosstabulation of observed and predicted response values are calculated with crosstableStatistics
  • p.zero.correction: a function to adjust slightly response/outcome-specific probability estimates which are exactly P=0; necessary for the proper calculation of pseudo-R-squared statistics; by default calculated on the basis of the dimensions of the matrix of probabilities p.values.

Returns

A list with the following components:

  • loglikelihood.null: Loglikelihood for null model
  • loglikelihood.model: Loglikelihood for fitted model
  • deviance.null: Null deviance
  • deviance.model: Model deviance
  • R2.likelihood: (McFadden's) R-squared
  • R2.nagelkerke: Nagelkerke's R-squared
  • AIC.model: Akaike's Information Criterion
  • BIC.model: Bayesian Information Criterion
  • C: index of concordance C (for binary response variables only)
  • crosstable: Crosstabulation of observed and predicted outcomes, if cross.tabulation=TRUE
  • crosstableStatistics(crosstable): Various statistics calculated on crosstable with crosstableStatistics, if cross.tabulation=TRUE

References

Arppe, A. 2008. Univariate, bivariate and multivariate methods in corpus-based lexicography -- a study of synonymy. Publications of the Department of General Linguistics, University of Helsinki, No. 44. URN: http://urn.fi/URN:ISBN:978-952-10-5175-3.

Arppe, A., and Baayen, R. H. (in prep.) Statistical modeling and the principles of human learning.

Hosmer, David W., Jr., and Stanley Lemeshow 2000. Applied Regression Analysis (2nd edition). New York: Wiley.

Author(s)

Antti Arppe and Harald Baayen

See Also

See also ndlClassify, ndlStatistics, crosstableStatistics.

Examples

data(think) think.ndl <- ndlClassify(Lexeme ~ Agent + Patient, data=think) probs <- acts2probs(think.ndl$activationMatrix)$p preds <- acts2probs(think.ndl$activationMatrix)$predicted n.data <- nrow(think) n.predictors <- nrow(think.ndl$weightMatrix) * ncol(think.ndl$weightMatrix) modelStatistics(observed=think$Lexeme, predicted=preds, p.values=probs, n.data=n.data, n.predictors=n.predictors)
  • Maintainer: Tino Sering
  • License: GPL-3
  • Last published: 2018-09-10

Useful links