Difference-in-Differences in Repeated Cross-Sections for Binary Treatments using Double Machine Learning
Difference-in-Differences in Repeated Cross-Sections for Binary Treatments using Double Machine Learning
This function estimates the average treatment effect on the treated (ATET) in the post-treatment period for a binary treatment using a doubly robust Difference-in-Differences (DiD) approach for repeated cross-sections that is combined with double machine learning. It controls for (possibly time-varying) confounders in a data-driven manner and supports various machine learning methods for estimating nuisance parameters through k-fold cross-fitting.
didDML( y, d, t, x, MLmethod ="lasso", est ="dr", trim =0.05, cluster =NULL, k =3)
Arguments
y: Outcome variable. Should not contain missing values.
d: Treatment group indicator (binary). Should not contain missing values.
t: Time period indicator (binary). Should be 1 for post-treatment period and 0 for pre-treatment period. Should not contain missing values.
x: Covariates to be controlled for. Should not contain missing values.
MLmethod: Machine learning method for estimating nuisance parameters using the SuperLearner package. Must be one of "lasso" (default), "randomforest", "xgboost", "svm", "ensemble", or "parametric".
est: Estimation method. Must be one of "dr" (default) for doubly robust, "ipw" for inverse probability weighting (not doubly robust!), or "reg" for regression (not doubly robust!).
trim: Trimming threshold (in percentage) for discarding observations with too small propensity scores within any subgroup defined by the treatment group and time. Default is 0.05.
cluster: Optional clustering variable for calculating cluster-robust standard errors.
k: Number of folds in k-fold cross-fitting. Default is 3.
Returns
A list with the following components:
ATET: Estimate of the Average Treatment Effect on the Treated (ATET) in the post-treatment period.
se: Standard error of the ATET estimate.
pval: P-value of the ATET estimate.
trimmed: Number of discarded (trimmed) observations.
pscores: Propensity scores of untrimmed observations (4 columns): under treatment in period 1, under treatment in period 0, under control in period 1, under control in period 0.
outcomepred: Conditional outcome predictions of untrimmed observations (3 columns): in treatment group in period 0, in control group in period 1, in control group in period 0.
treat: Treatment status of untrimmed observations.
time: Time period of untrimmed observations.
Details
This function estimates the Average Treatment Effect on the Treated (ATET) in the post-treatment period based on Difference-in-Differences in repeated cross-sections when controlling for confounders in a data-adaptive manner using double machine learning. The function supports different machine learning methods to estimate nuisance parameters (conditional mean outcomes and propensity scores) as well as cross-fitting to mitigate overfitting. Besides double machine learning, the function also provides inverse probability weighting and regression adjustment methods (which are, however, not doubly robust).
Examples
## Not run:# Example with simulated datan=4000# sample sizet=1*(rnorm(n)>0)# time periodu=runif(n,0,1)# time constant unobservablex=0.25*t+runif(n,0,1)# time varying covariated=1*(x+u+2*rnorm(n)>0)# treatmenty=d*t+t+x+u+2*rnorm(n)# outcome# true effect is equal to 1results=didDML(y=y, d=d, t=t, x=x)cat("ATET: ", round(results$ATET,3),", Standard error: ", round(results$se,3))## End(Not run)
References
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.
Zimmert, M. (2020): "Efficient difference-in-differences estimation with high-dimensional common trend confounding", arXiv preprint 1809.01643.