scest() R function from [scpi]

Prediction of Synthetic Control

The command implements estimation procedures for Synthetic Control (SC) methods using least squares, lasso, ridge, or simplex-type constraints. For more information see Cattaneo, Feng, and Titiunik (2021).

Companion Stata and Python packages are described in Cattaneo, Feng, Palomba, and Titiunik (2022).

Companion commands are: scdata and scdataMulti for data preparation in the single and multiple treated unit(s) cases, respectively, scpi for inference procedures, scplot and scplotMulti for plots in the single and multiple treated unit(s) cases, respectively.

Related Stata, R, and Python packages useful for inference in SC designs are described in the following website:

https://nppackages.github.io/scpi/

For an introduction to synthetic control methods, see Abadie (2021) and references therein.


scest(
  data,
  w.constr = NULL,
  V = "separate",
  V.mat = NULL,
  solver = "ECOS",
  plot = FALSE,
  plot.name = NULL,
  plot.path = NULL,
  save.data = NULL
)

Arguments

data: a class 'scdata' object, obtained by calling scdata, or class 'scdataMulti' obtained via scdataMulti.
w.constr: a list specifying the constraint set the estimated weights of the donors must belong to. w.constr can contain up to four objects:
- ``p', a string indicating the norm to be constrained (p` should be one of "no norm", "L1", and "L2")
- ``dir`', a string indicating whether the constraint on the norm is an equality ("==") or inequality ("<=")
- ``Q`', a scalar defining the value of the constraint on the norm
- ``lb', a scalar defining the lower bound on the weights. It can be either 0 or -Inf`.
- ``name`', a character selecting one of the default proposals. See the Details section for more.
V: specifies the type of weighting matrix to be used when minimizing the sum of squared residuals

(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r})'\mathbf{V}(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r})

The default is the identity matrix, so equal weight is given to all observations. In the case of multiple treated observations (you used `scdataMulti` to prepare the data), the user can specify `V` as a string equal to either "separate" or "pooled". If `scdata()` was used to prepare the data, `V` is automatically set to "separate" as the two options are equivalent. See the Details section for more.

V.mat: A conformable weighting matrix $\mathbf{V}$ to be used in the minimization of the sum of squared residuals

(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r})'\mathbf{V}(\mathbf{A}-\mathbf{B}\mathbf{w}-\mathbf{C}\mathbf{r}).

See the Details section for more information on how to prepare this matrix.

solver: a string containing the name of the solver used by CVXR when computing the weights. You can check which solvers are available on your machine by running CVXR::installed_solvers(). More information on what different solvers do can be found at the following link https://cvxr.rbind.io/cvxr_examples/cvxr_using-other-solvers/. "OSQP" is the default solver when 'lasso' is the constraint type, whilst "ECOS" is the default in all other cases.
plot: a logical specifying whether scplot should be called and a plot saved in the current working directory. For more options see scplot.
plot.name: a string containing the name of the plot (the format is by default .png). For more options see scplot.
plot.path: a string containing the path at which the plot should be saved (default is output of getwd().)
save.data: a character specifying the name and the path of the saved dataframe containing the processed data used to produce the plot.

Returns

The function returns an object of class 'scest' containing two lists. The first list is labeled 'data' and contains used data as returned by scdata and some other values. - A: a matrix containing pre-treatment features of the treated unit(s).

B: a matrix containing pre-treatment features of the control units.
C: a matrix containing covariates for adjustment.
P: a matrix whose rows are the vectors used to predict the out-of-sample series for the synthetic unit(s).
P.diff: for internal use only.
Y.pre: a matrix containing the (raw) pre-treatment outcome of the treated unit(s).
Y.post: a matrix containing the (raw) post-treatment outcome of the treated unit(s).
Y.pre.agg: a matrix containing the aggregate pre-treatment outcome of the treated unit(s). This differs from Y.pre only in the case 'effect' in scdataMulti() is set to either 'unit' or 'time'.
Y.post.agg: a matrix containing the aggregate post-treatment outcome of the treated unit(s). This differs from Y.post only in the case 'effect' in scdataMulti() is set to either 'unit' or 'time'.
Y.donors: a matrix containing the pre-treatment outcome of the control units.
specs: a list containing some specifics of the data:
- J, the number of control units
- K, a numeric vector with the number of covariates used for adjustment for each feature
- M, number of features
- KM, the total number of covariates used for adjustment
- KMI, the total number of covariates used for adjustment
- I, number of treated units
- period.pre, a numeric vector with the pre-treatment period
- period.post, a numeric vector with the post-treatment period
- T0.features, a numeric vector with the number of periods used in estimation for each feature
- T1.outcome, the number of post-treatment periods
- constant, for internal use only
- effect, for internal use only
- anticipation, number of periods of potential anticipation effects
- out.in.features, for internal use only
- treated.units, list containing the IDs of all treated units
- donors.list, list containing the IDs of the donors of each treated unit
- class.type, for internal use only

The second list is labeled 'est.results' and contains estimation results. - w: a matrix containing the estimated weights of the donors.

r: a matrix containing the values of the covariates used for adjustment.
b: a matrix containing $\mathbf{w}$ and $\mathbf{r}$ .
Y.pre.fit: a matrix containing the estimated pre-treatment outcome of the SC unit(s).
Y.post.fit: a matrix containing the estimated post-treatment outcome of the SC unit(s).
A.hat: a matrix containing the predicted values of the features of the treated unit(s).
res: a matrix containing the residuals $\mathbf{A}-\widehat{\mathbf{A}}$ .
V: a matrix containing the weighting matrix used in estimation.
w.constr: a list containing the specifics of the constraint set used on the weights.

Details

Information is provided for the simple case in which $N_1=1$ if not specified otherwise.

Estimation of Weights. w.constr specifies the constraint set on the weights. First, the element p allows the user to choose between imposing a constraint on either the L1 (p = "L1") or the L2 (p = "L2") norm of the weights and imposing no constraint on the norm (p = "no norm"). Second, Q specifies the value of the constraint on the norm of the weights. Third, lb sets the lower bound of each component of the vector of weights. Fourth, dir sets the direction of the constraint on the norm in case p = "L1"

or p = "L2". If dir = "==", then

||\mathbf{w}||_p = Q,\:\:\: w_j \geq lb,\:\: j =1,\ldots,J

If instead dir = "\<=", then

||\mathbf{w}||_p \leq Q,\:\:\: w_j \geq lb,\:\: j =1,\ldots,J

If instead dir = "NULL" no constraint on the norm of the weights is imposed.

An alternative to specifying an ad-hoc constraint set on the weights would be choosing among some popular types of constraints. This can be done by including the element ``name' in the list w.constr`. The following are available options:

* If `name == "simplex"` (the default), then

||\mathbf{w}||_1 = 1,\:\:\: w_j \geq 0,\:\: j =1,\ldots,J.

* If `name == "lasso"`, then

||\mathbf{w}||_1 \leq Q,

  where `Q` is by default equal to 1 but it can be provided as an element of the list (eg. `w.constr = list(name = "lasso", Q = 2)`).
* If `name == "ridge"`, then

||\mathbf{w}||_2 \leq Q,

  where `Q` is a tuning parameter that is by default computed as

(J+KM) \widehat{\sigma}_u^{2}/||\widehat{\mathbf{w}}_{OLS}||_{2}^{2}

  where $J$ is the number of donors and $KM$ is the total number of covariates used for adjustment. The user can provide `Q` as an element of the list (eg. `w.constr = list(name = "ridge", Q = 1)`).
* If `name == "ols"`, then the problem is unconstrained and the vector of weights is estimated via ordinary least squares.
* If `name == "L1-L2"`, then

||\mathbf{w}||_1 = 1,\:\:\: ||\mathbf{w}||_2 \leq Q, \:\:\: w_j \geq 0,\:\: j =1,\ldots,J.

  where $Q$ is a tuning parameter computed as in the "ridge" case.

Weighting Matrix.
- if V \\<- "separate", then $\mathbf{V} = \mathbf{I}$ and the minimized objective function is

\sum_{i=1}^{N_1} \sum_{l=1}^{M} \sum_{t=1}^{T_{0}}\left(a_{t, l}^{i}-\mathbf{b}_{t, l}^{{i \prime }} \mathbf{w}^{i}-\mathbf{c}_{t, l}^{{i \prime}} \mathbf{r}_{l}^{i}\right)^{2},

  which optimizes the separate fit for each treated unit.
* if `V \\<- "pooled"`, then $\mathbf{V} = \frac{1}{I}\mathbf{1}\mathbf{1}'\otimes \mathbf{I}$ and the minimized objective function is

\sum_{l=1}^{M} \sum_{t=1}^{T_{0}}\left(\frac{1}{N_1^2} \sum_{i=1}^{N_1}\left(a_{t, l}^{i}-\mathbf{b}_{t, l}^{i \prime} \mathbf{w}^{i}-\mathbf{c}_{t, l}^{i\prime} \mathbf{r}_{l}^{i}\right)\right)^{2},

  which optimizes the pooled fit for the average of the treated units.
* if the user wants to provide their own weighting matrix, then it must use the option `V.mat` to input a $v\times v$ positive-definite matrix, where $v$ is the number of rows of $\mathbf{B}$ (or $\mathbf{C}$) after potential missing values have been removed. In case the user wants to provide their own `V`, we suggest to check the appropriate dimension $v$ by inspecting the output of either `scdata` or `scdataMulti` and check the dimensions of $\mathbf{B}$ (and $\mathbf{C}$). Note that the weighting matrix could cause problems to the optimizer if not properly scaled. For example, if $\mathbf{V}$ is diagonal we suggest to divide each of its entries by $\|\mathrm{diag}(\mathbf{V})\|_1$.

Examples


data <- scpi_germany

df <- scdata(df = data, id.var = "country", time.var = "year",
             outcome.var = "gdp", period.pre = (1960:1990),
             period.post = (1991:2003), unit.tr = "West Germany",
             unit.co = setdiff(unique(data$country), "West Germany"),
             constant = TRUE, cointegrated.data = TRUE)

result <- scest(df, w.constr = list(name = "simplex", Q = 1))
result <- scest(df, w.constr = list(lb = 0, dir = "==", p = "L1", Q = 1))

References

Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.
c("Cattaneo, M. D., Feng, Y., and Titiunik, R.\n", "(2021)"). Prediction intervals for synthetic control methods. Journal of the American Statistical Association, 116(536), 1865-1880.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022).

scpi: Uncertainty Quantification for Synthetic Control Methods, arXiv:2202.05984.
Cattaneo, M. D., Feng, Y., Palomba F., and Titiunik, R. (2022).

Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption, arXiv:2210.05026.

Author(s)

Matias Cattaneo, Princeton University. cattaneo@princeton.edu .

Yingjie Feng, Tsinghua University. fengyj@sem.tsinghua.edu.cn .

Filippo Palomba, Princeton University (maintainer). fpalomba@princeton.edu .

Rocio Titiunik, Princeton University. titiunik@princeton.edu .

scpi package Read PDF manual

Maintainer: Filippo Palomba
License: GPL-2
Last published: 2025-01-31
https://nppackages.github.io/scpi/

scest function