GLM fitting with high-dimensional k-way fixed effects
GLM fitting with high-dimensional k-way fixed effects
feglm can be used to fit generalized linear models with many high-dimensional fixed effects. The estimation procedure is based on unconditional maximum likelihood and can be interpreted as a weighted demeaning approach.
Remark: The term fixed effect is used in econometrician's sense of having intercepts for each level in each category.
feglm( formula =NULL, data =NULL, family = gaussian(), weights =NULL, beta_start =NULL, eta_start =NULL, control =NULL)
Arguments
formula: an object of class "formula": a symbolic description of the model to be fitted. formula must be of type y ~ x | k, where the second part of the formula refers to factors to be concentrated out. It is also possible to pass clustering variables to feglm
as y ~ x | k | c.
data: an object of class "data.frame" containing the variables in the model. The expected input is a dataset with the variables specified in formula and a number of rows at least equal to the number of variables in the model.
family: the link function to be used in the model. Similar to glm.fit this has to be the result of a call to a family function. Default is gaussian(). See family for details of family functions.
weights: an optional string with the name of the 'prior weights' variable in data.
beta_start: an optional vector of starting values for the structural parameters in the linear predictor. Default is β=0.
eta_start: an optional vector of starting values for the linear predictor.
control: a named list of parameters for controlling the fitting process. See feglm_control for details.
Returns
A named list of class "feglm". The list contains the following fifteen elements: - coefficients: a named vector of the estimated coefficients
eta: a vector of the linear predictor
weights: a vector of the weights used in the estimation
hessian: a matrix with the numerical second derivatives
deviance: the deviance of the model
null_deviance: the null deviance of the model
conv: a logical indicating whether the model converged
iter: the number of iterations needed to converge
nobs: a named vector with the number of observations used in the estimation indicating the dropped and perfectly predicted observations
lvls_k: a named vector with the number of levels in each fixed effects
nms_fe: a list with the names of the fixed effects variables
formula: the formula used in the model
data: the data used in the model after dropping non-contributing observations
family: the family used in the model
control: the control list used in the model
Details
If feglm does not converge this is often a sign of linear dependence between one or more regressors and a fixed effects category. In this case, you should carefully inspect your model specification.
Examples
# subset trade flows to avoid fitting time warnings during checkset.seed(123)trade_2006 <- trade_panel[trade_panel$year ==2006,]trade_2006 <- trade_2006[sample(nrow(trade_2006),500),]mod <- feglm( trade ~ log_dist + lang + cntg + clny | exp_year + imp_year, trade_2006, family = poisson(link ="log"))summary(mod)mod <- feglm( trade ~ log_dist + lang + cntg + clny | exp_year + imp_year | pair, trade_panel, family = poisson(link ="log"))summary(mod, type ="clustered")
References
Gaure, S. (2013). "OLS with Multiple High Dimensional Category Variables". Computational Statistics and Data Analysis, 66.
Marschner, I. (2011). "glm2: Fitting generalized linear models with convergence problems". The R Journal, 3(2).
Stammann, A., F. Heiss, and D. McFadden (2016). "Estimating Fixed Effects Logit Models with Large Panel Data". Working paper.
Stammann, A. (2018). "Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-Way Fixed Effects". ArXiv e-prints.