A function performs standard plugin lasso PPML estimation (without fixed effects) for several dependent variables in a single step. This is still IN DEVELOPMENT: at the current stage, only coefficient estimates are are provided and there is no support for clustered errors.
data: A data frame containing all relevant variables.
dep: A string with the names of the independent variables or their column numbers.
indep: A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix.
selectobs: Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R).
...: Further arguments, including:
tol: Tolerance parameter for convergence of the IRLS algorithm.
glmnettol: Tolerance parameter to be passed on to glmnet::glmnet.
penweights: Optional: a vector of coefficient-specific penalties to use in plugin lasso.
colcheck: Logical. If TRUE, checks for perfect multicollinearity in x.
K: Maximum number of iterations.
verbose: Logical. If TRUE, prints information to the screen while evaluating.
lambda: Penalty parameter (a number).
icepost: Logical. If TRUE, it carries out a post-lasso estimation with just the selected variables and reports the coefficients from this regression.
Returns
A matrix with coefficient estimates for all dependent variables.
Details
This functions enables users to implement the "iceberg" step in the two-step procedure described in Breinlich, Corradi, Rocha, Ruta, Santos Silva and Zylkin (2020). To do this after using the plugin method in mlfitppml, just select all the variables with non-zero coefficients in dep and the remaining regressors in indep. The function will then perform separate lasso estimation on each of the selected dependent variables and report the coefficients.
References
Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.
Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.
Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.
Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.
Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.