feglm function

GLM fitting with high-dimensional k-way fixed effects

GLM fitting with high-dimensional k-way fixed effects

feglm can be used to fit generalized linear models with many high-dimensional fixed effects. The estimation procedure is based on unconditional maximum likelihood and can be interpreted as a weighted demeaning approach.

Remark: The term fixed effect is used in econometrician's sense of having intercepts for each level in each category.

feglm( formula = NULL, data = NULL, family = gaussian(), weights = NULL, beta_start = NULL, eta_start = NULL, control = NULL )

Arguments

  • formula: an object of class "formula": a symbolic description of the model to be fitted. formula must be of type y ~ x | k, where the second part of the formula refers to factors to be concentrated out. It is also possible to pass clustering variables to feglm

    as y ~ x | k | c.

  • data: an object of class "data.frame" containing the variables in the model. The expected input is a dataset with the variables specified in formula and a number of rows at least equal to the number of variables in the model.

  • family: the link function to be used in the model. Similar to glm.fit this has to be the result of a call to a family function. Default is gaussian(). See family for details of family functions.

  • weights: an optional string with the name of the 'prior weights' variable in data.

  • beta_start: an optional vector of starting values for the structural parameters in the linear predictor. Default is β=0\beta = 0.

  • eta_start: an optional vector of starting values for the linear predictor.

  • control: a named list of parameters for controlling the fitting process. See feglm_control for details.

Returns

A named list of class "feglm". The list contains the following fifteen elements: - coefficients: a named vector of the estimated coefficients

  • eta: a vector of the linear predictor

  • weights: a vector of the weights used in the estimation

  • hessian: a matrix with the numerical second derivatives

  • deviance: the deviance of the model

  • null_deviance: the null deviance of the model

  • conv: a logical indicating whether the model converged

  • iter: the number of iterations needed to converge

  • nobs: a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations

  • lvls_k: a named vector with the number of levels in each fixed effects

  • nms_fe: a list with the names of the fixed effects variables

  • formula: the formula used in the model

  • data: the data used in the model after dropping non-contributing observations

  • family: the family used in the model

  • control: the control list used in the model

Details

If feglm does not converge this is often a sign of linear dependence between one or more regressors and a fixed effects category. In this case, you should carefully inspect your model specification.

Examples

# subset trade flows to avoid fitting time warnings during check set.seed(123) trade_2006 <- trade_panel[trade_panel$year == 2006, ] trade_2006 <- trade_2006[sample(nrow(trade_2006), 500), ] mod <- feglm( trade ~ log_dist + lang + cntg + clny | exp_year + imp_year, trade_2006, family = poisson(link = "log") ) summary(mod) mod <- feglm( trade ~ log_dist + lang + cntg + clny | exp_year + imp_year | pair, trade_panel, family = poisson(link = "log") ) summary(mod, type = "clustered")

References

Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.

Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).

Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.

Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.

  • Maintainer: Mauricio Vargas Sepulveda
  • License: Apache License (>= 2)
  • Last published: 2025-03-26