Model Selection and Multimodel Inference Based on (Q)AIC(c)
Model Selection and Multimodel Inference Based on (Q)AIC(c)
Description: This package includes functions to create model selection tables based on Akaike's information criterion (AIC) and the second-order AIC (AICc), as well as their quasi-likelihood counterparts (QAIC, QAICc). The package also features functions to conduct classic model averaging (multimodel inference) for a given parameter of interest or predicted values, as well as a shrinkage version of model averaging parameter estimates. Other handy functions enable the computation of relative variable importance, evidence ratios, and confidence sets for the best model. The present version supports Cox proportional hazards models and conditional logistic regression (coxph and coxme classes), linear models (lm class), generalized linear models (glm, glm.nb, vglm, hurdle, and zeroinfl classes), linear models fit by generalized least squares (gls class), linear mixed models (lme class), generalized linear mixed models (mer, merMod, and glmmTMB classes), multinomial and ordinal logistic regressions (multinom, polr, clm, and clmm classes), robust regression models (rlm class), beta regression models (betareg class), parametric survival models (survreg
class), nonlinear models (nls and gnls classes), nonlinear mixed models (nlme and nlmerMod classes), univariate models (fitdist and fitdistr classes), and certain types of latent variable models (lavaan class). The package also supports various models of unmarkedFit and maxLikeFit classes estimating demographic parameters after accounting for imperfect detection probabilities. Some functions also allow the creation of model selection tables for Bayesian models of the bugs and rjags classes. Objects following model selection and multimodel inference can be formatted to LaTeX using xtable methods included in the package.
1.1
package
Details
Package:
AICcmodavg
Type:
Package
Version:
2.3-4
Date:
2025-03-03
License:
GPL (>=2 )
LazyLoad:
yes
Many functions of the package require a list of models as the input to conduct model selection and multimodel inference. Thus, you should start by organizing the output of the models in a list (See 'Examples' below).
This package contains several useful functions for model selection and multimodel inference for several model classes:
AICc: Computes AIC, AICc, and their quasi-likelihood counterparts (QAIC, QAICc).
aictab: Constructs model selection tables with number of parameters, AIC, delta AIC, Akaike weights or variants based on AICc, QAIC, and QAICc for a set of candidate models.
bictab: Constructs model selection tables with number of parameters, BIC, delta BIC, BIC weights for a set of candidate models.
boot.wt: Computes summary statistics from detection histories.
confset: Determines the confidence set for the best model based on one of three criteria.
DIC: Extracts DIC.
dictab: Constructs model selection tables with number of parameters, DIC, delta DIC, DIC weights for a set of candidate models.
evidence: Computes the evidence ratio between the highest-ranked model based on the information criteria selected and a lower-ranked model.
importance: Computes importance values (w+) for the support of a given parameter among set of candidate models.
modavg: Computes model-averaged estimate, unconditional standard error, and unconditional confidence interval of a parameter of interest among a set of candidate models.
modavgEffect: Computes model-averaged effect sizes between groups based on the entire candidate model set.
modavgShrink: Computes shrinkage version of model-averaged estimate, unconditional standard error, and unconditional confidence interval of a parameter of interest among entire set of candidate models.
modavgPred: Computes model-average predictions, unconditional SE's, and confidence intervals among entire set of candidate models.
multComp: Performs multiple comparisons across levels of a factor in a model selection framework.
useBIC: Computes BIC or a quasi-likelihood counterparts (QBIC).
For models not yet supported by the functions above, the following
can be useful for model selection and multimodel inference conducted from input values supplied by the user:
AICcCustom: Computes AIC, AICc, QAIC, and QAICc from user-supplied input values of log-likelihood and number of parameters.
aictabCustom: Creates model selection tables based on (Q)AIC(c) from user-supplied input values of log-likelihood and number of parameters.
bictabCustom: Creates model selection tables based on (Q)BIC from user-supplied input values of log-likelihood and number of parameters.
ictab: Creates model selection tables from user-supplied values of an information criterion.
modavgCustom: Computes model-averaged parameter estimate based on (Q)AIC(c) from user-supplied input values of log-likelihood, number of parameters, parameter estimates, and standard errors.
modavgIC: Computes model-averaged parameter estimate from user-supplied values of information criterion, parameter estimates, and standard errors.
useBICCustom: Computes BIC and QBIC from user-supplied input values of log-likelihoods and number of parameters.
A number of functions for model diagnostics are available:
c_hat: Estimates variance inflation factor for binomial or Poisson GLM's based on various estimators.
checkConv: Checks the convergence information of the algorithm for the model.
checkParms: Checks the occurrence of parameter estimates with high standard errors in a model.
countDist: Computes summary statistics from distance sampling data.
countHist: Computes summary statistics from count history data.
covDiag: Computes covariance diagnostics for lambda in N-mixture models.
detHist: Computes summary statistics from detection histories.
detTime: Computes summary statistics from time-to-detection data.
extractCN: Extracts condition number from models of certain classes.
mb.gof.test: Computes the MacKenzie and Bailey goodness-of-fit test for single season and dynamic occupancy models using the Pearson chi-square statistic.
Nmix.gof.test: Computes goodness-of-fit test for N-mixture models based on the Pearson chi-square statistic.
Other utility functions include:
anovaOD: Computes likelihood-ratio test statistic corrected for overdispersion between two models.
extractLL: Extracts log-likelihood from models of certain classes.
extractSE: Extracts standard errors from models of certain classes and adds the labels.
extractX: Extracts the predictors and associated information on variables from a list of candidate models.
fam.link.mer: Extracts the distribution family and link function from a generalized linear mixed model of classes mer
and `merMod`.
predictSE: Computes predictions and associated standard errors models of certain classes.
summaryOD: Displays summary of model output adjusted for overdispersion.
xtable: Extends methods from xtable to format various objects resulting from model selection and multimodel inference to LaTeX or HTML tables.
Anderson, D. R. (2008) Model-based inference in the life sciences: a primer on evidence. Springer: New York.
Burnham, K. P., and Anderson, D. R. (2002) Model selection and multimodel inference: a practical information-theoretic approach. Second edition. Springer: New York.
Burnham, K. P., Anderson, D. R. (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research 33 , 261--304.
Mazerolle, M. J. (2006) Improving data analysis in herpetology: using Akaike's Information Criterion (AIC) to assess the strength of biological hypotheses. Amphibia-Reptilia 27 , 169--180.
Examples
##Example 1: Poisson GLM with offset##anuran larvae example from Mazerolle (2006) data(min.trap)##assign "UPLAND" as the reference level as in Mazerolle (2006) min.trap$Type <- relevel(min.trap$Type, ref ="UPLAND")##set up candidate models in a listCand.mod <- list()##global model Cand.mod[[1]]<- glm(Num_anura ~ Type + log.Perimeter + Num_ranatra, family = poisson, offset = log(Effort), data = min.trap)Cand.mod[[2]]<- glm(Num_anura ~ Type + log.Perimeter, family = poisson, offset = log(Effort), data = min.trap)Cand.mod[[3]]<- glm(Num_anura ~ Type + Num_ranatra, family = poisson, offset = log(Effort), data = min.trap)Cand.mod[[4]]<- glm(Num_anura ~ Type, family = poisson, offset = log(Effort), data = min.trap)Cand.mod[[5]]<- glm(Num_anura ~ log.Perimeter + Num_ranatra, family = poisson, offset = log(Effort), data = min.trap)Cand.mod[[6]]<- glm(Num_anura ~ log.Perimeter, family = poisson, offset = log(Effort), data = min.trap)Cand.mod[[7]]<- glm(Num_anura ~ Num_ranatra, family = poisson, offset = log(Effort), data = min.trap)Cand.mod[[8]]<- glm(Num_anura ~1, family = poisson, offset = log(Effort), data = min.trap)##check c-hat for global modelc_hat(Cand.mod[[1]], method ="pearson")#uses Pearson's chi-square/df##note the very low overdispersion: in this case, the analysis could be##conducted without correcting for c-hat as its value is reasonably close##to 1 ##output of model corrected for overdispersionsummaryOD(Cand.mod[[1]], c.hat =1.04)##assign names to each modelModnames <- c("type + logperim + invertpred","type + logperim","type + invertpred","type","logperim + invertpred","logperim","invertpred","intercept only")##model selection table based on AICcaictab(cand.set = Cand.mod, modnames = Modnames)##compute evidence ratioevidence(aictab(cand.set = Cand.mod, modnames = Modnames))##compute confidence set based on 'raw' methodconfset(cand.set = Cand.mod, modnames = Modnames, second.ord =TRUE, method ="raw")##compute importance value for "TypeBOG" - same number of models##with vs without variableimportance(cand.set = Cand.mod, modnames = Modnames, parm ="TypeBOG")##compute model-averaged estimate of "TypeBOG" using the natural averagemodavg(cand.set = Cand.mod, modnames = Modnames, parm ="TypeBOG")##compute model-averaged estimate of "TypeBOG" using shrinkage estimator##same number of models with vs without variablemodavgShrink(cand.set = Cand.mod, modnames = Modnames, parm ="TypeBOG")##compute model-averaged predictions for two types of ponds##create a data set for predictionsdat.pred <- data.frame(Type = factor(c("BOG","UPLAND")), log.Perimeter = mean(min.trap$log.Perimeter), Num_ranatra = mean(min.trap$Num_ranatra), Effort = mean(min.trap$Effort))##model-averaged predictions across entire model setmodavgPred(cand.set = Cand.mod, modnames = Modnames, newdata = dat.pred, type ="response")##compute model-averaged effect size between two groups##'newdata' data frame must be limited to two rowsmodavgEffect(cand.set = Cand.mod, modnames = Modnames, newdata = dat.pred, type ="link")## Not run:##Example 2: single-season occupancy model example modified from ?occurequire(unmarked)##single seasondata(frogs)pferUMF <- unmarkedFrameOccu(pfer.bin)## add some fake covariates for illustrationsiteCovs(pferUMF)<- data.frame(sitevar1 = rnorm(numSites(pferUMF)), sitevar2 = rnorm(numSites(pferUMF)))## observation covariates are in site-major, observation-minor orderobsCovs(pferUMF)<- data.frame(obsvar1 = rnorm(numSites(pferUMF)* obsNum(pferUMF)))##check detection history data from data objectdetHist(pferUMF)##set up candidate model setfm1 <- occu(~ obsvar1 ~ sitevar1, pferUMF)##check detection history data from model objectdetHist(fm1)fm2 <- occu(~1~ sitevar1, pferUMF)fm3 <- occu(~ obsvar1 ~ sitevar2, pferUMF)fm4 <- occu(~1~ sitevar2, pferUMF)Cand.models <- list(fm1, fm2, fm3, fm4)##assign names to elements in list##alternative to using 'modnames' argumentnames(Cand.models)<- c("fm1","fm2","fm3","fm4")##check GOF of global model and estimate c-hatmb.gof.test(fm4, nsim =100)#nsim should be > 1000##check for high SE's in modelslapply(Cand.models, checkParms, simplify =FALSE)##compute tableprint(aictab(cand.set = Cand.models, second.ord =TRUE), digits =4)##export as LaTeX tableif(require(xtable)){xtable(aictab(cand.set = Cand.models, second.ord =TRUE))}##compute evidence ratioevidence(aictab(cand.set = Cand.models))##evidence ratio between top model vs lowest-ranked modelevidence(aictab(cand.set = Cand.models), model.high ="fm2", model.low ="fm3")##compute confidence set based on 'raw' methodconfset(cand.set = Cand.models, second.ord =TRUE, method ="raw")##compute importance value for "sitevar1" on occupancy##same number of models with vs without variableimportance(cand.set = Cand.models, parm ="sitevar1", parm.type ="psi")##compute model-averaged estimate of "sitevar1" on occupancy##(natural average)modavg(cand.set = Cand.models, parm ="sitevar1", parm.type ="psi")##compute model-averaged estimate of "sitevar1"##(shrinkage estimator)##same number of models with vs without variablemodavgShrink(cand.set = Cand.models, parm ="sitevar1", parm.type ="psi")##compute model-average predictions##check explanatory variables appearing in modelsextractX(Cand.models, parm.type ="psi")##create a data set for predictionsdat.pred <- data.frame(sitevar1 = seq(from = min(siteCovs(pferUMF)$sitevar1), to = max(siteCovs(pferUMF)$sitevar1), by =0.5), sitevar2 = mean(siteCovs(pferUMF)$sitevar2))##model-averaged predictions of psi across range of values##of sitevar1 and entire model setmodavgPred(cand.set = Cand.models, newdata = dat.pred, parm.type ="psi")detach(package:unmarked)## End(Not run)## Not run:##Example 3: example with user-supplied values of log-likelihoods and##number of parameters##vector with model LL'sLL <- c(-38.8876,-35.1783,-64.8970)##vector with number of parametersKs <- c(7,9,4)##create a vector of names to trace back models in setModnames <- c("Cm1","Cm2","Cm3")##generate AICc tableaictabCustom(logL = LL, K = Ks, modnames = Modnames, nobs =121, sort =TRUE)##generate AIC tableaictabCustom(logL = LL, K = Ks, modnames = Modnames, second.ord =FALSE, nobs =121, sort =TRUE)##model averaging parameter estimate##vector of beta estimates for a parameter of interestmodel.ests <- c(0.0478,0.0480,0.0478)##vector of SE's of beta estimates for a parameter of interestmodel.se.ests <- c(0.0028,0.0028,0.0034)##compute model-averaged estimate and unconditional SE based on AICcmodavgCustom(logL = LL, K = Ks, modnames = Modnames, estimate = model.ests, se = model.se.ests, nobs =121)##compute model-averaged estimate and unconditional SE based on BICmodavgCustom(logL = LL, K = Ks, modnames = Modnames, estimate = model.ests, se = model.se.ests, nobs =121, useBIC =TRUE)## End(Not run)## Not run:##Example 4: example with user-supplied values of information criterion##model selection based on WAIC##WAIC valueswaic <- c(105.74,107.36,108.24,100.57)##number of effective parameterseffK <- c(7.45,5.61,6.14,6.05)##create a vector of names to trace back models in setModnames <- c("global model","interactive model","additive model","invertpred model")##generate WAIC model selection tableictab(ic = waic, K = effK, modnames = Modnames, sort =TRUE, ic.name ="WAIC")##compute model-averaged estimate##vector of predictionsPreds <- c(0.106,0.137,0.067,0.050)##vector of SE's for predictionSes <- c(0.128,0.159,0.054,0.039)##compute model-averaged estimate and unconditional SE based on WAICmodavgIC(ic = waic, K = effK, modnames = Modnames, estimate = Preds, se = Ses, ic.name ="WAIC")##export as LaTeX tableif(require(xtable)){##model-averaged estimate and confidence intervalxtable(modavgIC(ic = waic, K = effK, modnames = Modnames, estimate = Preds, se = Ses, ic.name ="WAIC"))##model selection table with estimate and SE's from each modelxtable(modavgIC(ic = waic, K = effK, modnames = Modnames, estimate = Preds, se = Ses, ic.name ="WAIC"), print.table =TRUE)}## End(Not run)