Data Preparation for scest or scpi for Point Estimation and Inference Procedures Using Synthetic Control Methods.
Data Preparation for scest or scpi for Point Estimation and Inference Procedures Using Synthetic Control Methods.
The command prepares the data to be used by scest or scpi to implement estimation and inference procedures for Synthetic Control (SC) methods. It allows the user to specify the outcome variable, the features of the treated unit to be matched, and covariate-adjustment feature by feature. The names of the output matrices follow the terminology proposed in Cattaneo, Feng, and Titiunik (2021).
Companion commands are: scdataMulti for data preparation in the multiple treated units case with staggered adoption, scest for point estimation, scpi for inference procedures, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.
Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:
id.var: a character or numeric scalar with the name of the variable containing units' IDs. The ID variable can be numeric or character.
time.var: a character with the name of the time variable. The time variable has to be numeric, integer, or Date. In case time.var is Date it should be the output of as.Date() function. An integer or numeric time variable is suggested when working with yearly data, whereas for all other formats a Date type time variable is preferred.
outcome.var: a character with the name of the outcome variable. The outcome variable has to be numeric.
period.pre: a numeric vector that identifies the pre-treatment period in time.var.
period.post: a numeric vector that identifies the post-treatment period in time.var.
unit.tr: a character or numeric scalar that identifies the treated unit in id.var.
unit.co: a character or numeric vector that identifies the donor pool in id.var.
features: a character vector containing the name of the feature variables used for estimation. If this option is not specified the default is features = outcome.var.
cov.adj: a list specifying the names of the covariates to be used for adjustment for each feature. If outcome.var is not in the variables specified in features, we force cov.adj<-NULL. See the Details section for more.
cointegrated.data: a logical that indicates if there is a belief that the data is cointegrated or not. The default value is FALSE. See the Details section for more.
anticipation: a scalar that indicates the number of periods of potential anticipation effects. Default is 0.
constant: a logical which controls the inclusion of a constant term across features. The default value is FALSE.
verbose: if TRUE prints additional information in the console.
Returns
The command returns an object of class 'scdata' containing the following - A: a matrix containing pre-treatment features of the treated unit.
B: a matrix containing pre-treatment features of the control units.
C: a matrix containing covariates for adjustment.
P: a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic unit.
Y.pre: a matrix containing the pre-treatment outcome of the treated unit.
Y.post: a matrix containing the post-treatment outcome of the treated unit.
Y.donors: a matrix containing the pre-treatment outcome of the control units.
specs: a list containing some specifics of the data:
J, the number of control units
K, a numeric vector with the number of covariates used for adjustment for each feature
KM, the total number of covariates used for adjustment
M, number of features
period.pre, a numeric vector with the pre-treatment period
period.post, a numeric vector with the post-treatment period
T0.features, a numeric vector with the number of periods used in estimation for each feature
T1.outcome, the number of post-treatment periods
outcome.var, a character with the name of the outcome variable
features, a character vector with the name of the features
constant, for internal use only
out.in.features, for internal use only
effect, for internal use only
sparse.matrices, for internal use only
treated.units, list containing the IDs of all treated units
Details
cov.adj can be used in two ways. First, if only one feature is specified through the option features, cov.adj has to be a list with one (even unnamed) element (eg. cov.adj = list(c("constant","trend"))). Alternatively, if multiple features are specified, then the user has two possibilities:
provide a list with one element, then the same covariates are used for adjustment for each feature. For example, if there are two features specified and the user inputs cov.adj = list(c("constant","trend")), then a constant term and a linear trend are for adjustment for both features.
provide a list with as many elements as the number of features specified, then feature-specific covariate adjustment is implemented. For example, cov.adj = list('f1' = c("constant","trend"), 'f2' = c("trend")). In this case the name of each element of the list should be one (and only one) of the features specified. Note that if two (or more) features are specified and covariates adjustment has to be specified just for one of them, the user must still provide a list of the same length of the number of features, e.g., cov.adj = list('f1' = c("constant","trend"), 'f2' = NULL.
This option allows the user to include feature-specific constant terms or time trends by simply including "constant" or "trend" in the corresponding element of the list.
When outcome.var is not included in features, we automatically set R=∅, that is we do not perform covariate adjustment. This is because, in this setting it is natural to create the out-of-sample prediction matrix P using the post-treatment outcomes of the donor units only.
cointegrated.data allows the user to model the belief that A and B form a cointegrated system. In practice, this implies that when dealing with the pseudo-true residuals u, the first-difference of B are used rather than the levels.
Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.