is_separated function

Checking for (quasi-)separation in binomial-response model.

Checking for (quasi-)separation in binomial-response model.

Separation occurs in binomial response models when a combination of the predictor variables perfectly predict a level of the response. In such a case the estimates of the coefficients for these variables diverge to (+/-)infinity, and the numerical algorithms typically fail. To anticipate such a problem, the fitting functions in spaMM try to check for separation by default. The check may take much time, and is skipped if the problem size exceeds a threshold defined by spaMM.options(separation_max=<.>), in which case a message will tell users by how much they should increase separation_max to force the check (its exact meaning and default value are subject to changes without notice but the default value aims to correspond to a separation check time of the order of 1s on the author's computer).

is_separated is a convenient interface to procedures from the ROI package, allowing them to be called explicitly by the user to check bootstrap samples (see Example in anova). is_separated.formula is a variant (not yet a formal S3 method) that performs the same check, but using arguments similar to those of fitme(., family=binomial()).

is_separated(x, y, verbose = TRUE, solver=spaMM.getOption("sep_solver")) is_separated.formula(formula, ..., separation_max=spaMM.getOption("separation_max"), solver=spaMM.getOption("sep_solver"))

Arguments

  • x: Design matrix for fixed effects.
  • y: Numeric response vector
  • formula: A model formula
  • ...: data and possibly other arguments of a fitme call. family is ignored if present.
  • separation_max: numeric: non-default value allow for easier local control of this spaMM option.
  • solver: character: name of linear programming solver used to assess separation; passed to ROI_solve's solver argument. One can select another solver if the corresponding ROI plugin is installed.
  • verbose: Whether to print some messages (e.g., pointing model terms that cause separation) or not.

Returns

Returns a boolean; TRUE means there is (quasi-)separation. Screen output may give further information, such as pointing model terms that cause separation.

References

The method accessible by solver="glpk" implements algorithms described by

Konis, K. 2007. Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models. DPhil Thesis, Univ. Oxford. https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a.

See Also

See also the 'safeBinaryRegression' and 'detectseparation' package.

Examples

set.seed(123) d <- data.frame(success = rbinom(10, size = 1, prob = 0.9), x = 1:10) is_separated.formula(formula= success~x, data=d) # FALSE is_separated.formula(formula= success~I(success^2), data=d) # TRUE
  • Maintainer: François Rousset
  • License: CeCILL-2
  • Last published: 2024-06-09