feglm() R function from [capybara]

GLM fitting with high-dimensional k-way fixed effects

feglm can be used to fit generalized linear models with many high-dimensional fixed effects. The estimation procedure is based on unconditional maximum likelihood and can be interpreted as a weighted demeaning approach.

Remark: The term fixed effect is used in econometrician's sense of having intercepts for each level in each category.


feglm(
  formula = NULL,
  data = NULL,
  family = gaussian(),
  weights = NULL,
  beta_start = NULL,
  eta_start = NULL,
  control = NULL
)

Arguments

formula: an object of class "formula": a symbolic description of the model to be fitted. formula must be of type y ~ x | k, where the second part of the formula refers to factors to be concentrated out. It is also possible to pass clustering variables to feglm

as y ~ x | k | c.
data: an object of class "data.frame" containing the variables in the model. The expected input is a dataset with the variables specified in formula and a number of rows at least equal to the number of variables in the model.
family: the link function to be used in the model. Similar to glm.fit this has to be the result of a call to a family function. Default is gaussian(). See family for details of family functions.
weights: an optional string with the name of the 'prior weights' variable in data.
beta_start: an optional vector of starting values for the structural parameters in the linear predictor. Default is $\beta = 0$ .
eta_start: an optional vector of starting values for the linear predictor.
control: a named list of parameters for controlling the fitting process. See feglm_control for details.

Returns

A named list of class "feglm". The list contains the following fifteen elements: - coefficients: a named vector of the estimated coefficients

eta: a vector of the linear predictor
weights: a vector of the weights used in the estimation
hessian: a matrix with the numerical second derivatives
deviance: the deviance of the model
null_deviance: the null deviance of the model
conv: a logical indicating whether the model converged
iter: the number of iterations needed to converge
nobs: a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations
lvls_k: a named vector with the number of levels in each fixed effects
nms_fe: a list with the names of the fixed effects variables
formula: the formula used in the model
data: the data used in the model after dropping non-contributing observations
family: the family used in the model
control: the control list used in the model

Details

If feglm does not converge this is often a sign of linear dependence between one or more regressors and a fixed effects category. In this case, you should carefully inspect your model specification.

Examples


# subset trade flows to avoid fitting time warnings during check
set.seed(123)
trade_2006 <- trade_panel[trade_panel$year == 2006, ]
trade_2006 <- trade_2006[sample(nrow(trade_2006), 500), ]

mod <- feglm(
  trade ~ log_dist + lang + cntg + clny | exp_year + imp_year,
  trade_2006,
  family = poisson(link = "log")
)

summary(mod)

mod <- feglm(
  trade ~ log_dist + lang + cntg + clny | exp_year + imp_year | pair,
  trade_panel,
  family = poisson(link = "log")
)

summary(mod, type = "clustered")

References

Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.

Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).

Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.

Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.

capybara package Read PDF manual

Maintainer: Mauricio Vargas Sepulveda
License: Apache License (>= 2)
Last published: 2025-03-26

Useful links

feglm function