| Title: | Penalized Synthetic Control Estimation |
|---|---|
| Description: | Estimate penalized synthetic control models and perform hold-out validation to determine their penalty parameter. This method is based on the work by Abadie & L'Hour (2021) <doi:10.1080/01621459.2021.1971535>. Penalized synthetic controls smoothly interpolate between one-to-one matching and the synthetic control method. |
| Authors: | Erik-Jan van Kesteren [cre, aut] (ORCID: <https://orcid.org/0000-0003-1548-1663>), Isaac Slaughter [ctb] (ORCID: <https://orcid.org/0000-0002-1911-2374>) |
| Maintainer: | Erik-Jan van Kesteren <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.8.2 |
| Built: | 2026-06-04 06:38:17 UTC |
| Source: | https://github.com/vankesteren/pensynth |
Compute a penalized synthetic control estimator with hold-out validation for the lambda penalty parameter. Lambda will be determined by minimizing the mean squared error on a hold-out set of pre-intervention outcome time-series.
cv_pensynth( X1, X0, Z1, Z0, v = 1, nlambda = 100, opt_pars = clarabel::clarabel_control(), standardize = TRUE, return_solver_info = FALSE, verbose = interactive(), adaptive_lambda = TRUE )cv_pensynth( X1, X0, Z1, Z0, v = 1, nlambda = 100, opt_pars = clarabel::clarabel_control(), standardize = TRUE, return_solver_info = FALSE, verbose = interactive(), adaptive_lambda = TRUE )
X1 |
|
X0 |
|
Z1 |
|
Z0 |
|
v |
|
nlambda |
|
opt_pars |
|
standardize |
|
return_solver_info |
|
verbose |
|
adaptive_lambda |
|
The lambda sequence is an exponentially increasing sequence where The minimum lambda is always 1e-11, the max lambda is determined by the data.
For multiple treated units, is adaptive_lambda is set to FALSE, the (shared) minimum
lambda will be selected by local regression of sqrt(mse) on log(lambda).
A list of optimal weights, optimal lambda(s), the lambda sequence(s),
the associated weights, and the mses. If there are multiple treated units,
this list contains sublists for each unit. If return_solver_info is TRUE,
the list will also contain diagnostic information about the solvers.
pensynth(), plot.cvpensynth(), placebo_test(), simulate_data_synth()
set.seed(45) dat <- simulate_data_synth() res <- with(dat, cv_pensynth(X1, X0, Z1, Z0)) plot(res)set.seed(45) dat <- simulate_data_synth() res <- with(dat, cv_pensynth(X1, X0, Z1, Z0)) plot(res)
This function finds out if the treated unit is in the convex hull of the donor units.
in_convex_hull(X1, X0, verbose = interactive(), ...)in_convex_hull(X1, X0, verbose = interactive(), ...)
X1 |
|
X0 |
|
verbose |
|
... |
additional arguments passed to |
This function does not actually construct the convex hull (which is infeasible in higher dimensions), but rather it checks whether the following linear program has a feasible solution:
min q'w s.t. Aw = b
with w constrained to be above 0 and sum to 1, the feasibility of this linear program directly corresponds to whether the treated is in the convex hull
When the treated unit very close to the boundary of the convex hull
this method usually cannot determine this exactly and this function
may return NA with the warning "Solver terminated due to lack of
progress"
bool whether the treated unit is in the convex hull of
the donor units. NA if this cannot be determined. Vector if X1
has multiple columns.
# create some data set.seed(45) X0 <- matrix(runif(20), nrow = 2) X1 <- matrix(c(.5, .5)) # test if X1 is in the convex hull: in_convex_hull(X1, X0) # also works with multiple units in X1: X1 <- cbind(X1, c(1.3, -3)) in_convex_hull(X1, X0)# create some data set.seed(45) X0 <- matrix(runif(20), nrow = 2) X1 <- matrix(c(.5, .5)) # test if X1 is in the convex hull: in_convex_hull(X1, X0) # also works with multiple units in X1: X1 <- cbind(X1, c(1.3, -3)) in_convex_hull(X1, X0)
For a given set of variable weights (v) this function estimates the unit weights for a synthetic control with penalization according to Abadie & L'Hour (2021). This function deals with only a single treated unit.
pensynth( X1, X0, v = 1, lambda = 0, opt_pars = clarabel::clarabel_control(), standardize = TRUE, verbose = interactive() )pensynth( X1, X0, v = 1, lambda = 0, opt_pars = clarabel::clarabel_control(), standardize = TRUE, verbose = interactive() )
X1 |
|
X0 |
|
v |
|
lambda |
|
opt_pars |
|
standardize |
|
verbose |
|
This routine uses the same notation of the original Synth::synth() implementation
but uses a different, faster quadratic program solver (namely, clarabel::clarabel()).
Additionally, it implements the penalization procedure described in Abadie & L'Hour (2021),
such that the loss function is as in equation 5 of that paper.
Variable weights are not optimized by this function, meaning they need to be pre-specified. This is by design.
The original synthetic control method can be recovered by setting lambda = 0. For determining
lambda based on hold-out data, see cv_pensynth().
A list with two values: w, the estimated weights; and
solution, the result of the optimization.
Abadie, A., & L’Hour, J. (2021). A penalized synthetic control estimator for disaggregated data. Journal of the American Statistical Association, 116(536), 1817-1834.
cv_pensynth(), placebo_test(), simulate_data_synth(), Synth::synth()
# generate some data X0 <- matrix( c(1, 1.3, 0.5, 1.8, 1.1, 2.4, 1.8, 1.8, 1.3, 1.8), 2) X1 <- matrix(c(0.8, 1.65), 2) # run classic synthetic control (no penalization) res <- pensynth(X1, X0) # plot donor units in covariate space plot(t(X0), asp = 1, xlab = "X1", ylab = "X2", main = "Covariate space plot") # add the treated unit points(t(X1), pch = 2) # add the synthetic control points(t(X0%*%res$w), pch = 3) # run synthetic control with penalty res <- pensynth(X1, X0, lambda = 0.5) # the resulting synthetic control is # biased towards its closest neighbours points(t(X0 %*% res$w), pch = 4)# generate some data X0 <- matrix( c(1, 1.3, 0.5, 1.8, 1.1, 2.4, 1.8, 1.8, 1.3, 1.8), 2) X1 <- matrix(c(0.8, 1.65), 2) # run classic synthetic control (no penalization) res <- pensynth(X1, X0) # plot donor units in covariate space plot(t(X0), asp = 1, xlab = "X1", ylab = "X2", main = "Covariate space plot") # add the treated unit points(t(X1), pch = 2) # add the synthetic control points(t(X0%*%res$w), pch = 3) # run synthetic control with penalty res <- pensynth(X1, X0, lambda = 0.5) # the resulting synthetic control is # biased towards its closest neighbours points(t(X0 %*% res$w), pch = 4)
Perform a permutation test on a pensynth object, in the sense of Abadie, Diamond, and Hainmueller (2010). The pensynth method is performed multiple times, treating each donor as the treated unit and the treated unit with the remaining donors as the donor units.
placebo_test(object, Y1, Y0, verbose = TRUE) ## S3 method for class 'pensynth' placebo_test(object, Y1, Y0, verbose = TRUE) ## S3 method for class 'cvpensynth' placebo_test(object, Y1, Y0, verbose = TRUE)placebo_test(object, Y1, Y0, verbose = TRUE) ## S3 method for class 'pensynth' placebo_test(object, Y1, Y0, verbose = TRUE) ## S3 method for class 'cvpensynth' placebo_test(object, Y1, Y0, verbose = TRUE)
object |
a fitted |
Y1 |
the post-intervention outcome of the treated unit |
Y0 |
the post-intervention outcome of the donor units |
verbose |
|
Note that this function updates the original call in order to
re-estimate the synthetic control on the permuted data.
Ensure that the data is available to the placebo test function
(i.e., avoid complex environment functions such as with()),
and ensure that the data does not change between estimating the
original object and calling this function.
A list with two elements
E1, the treated unit effect(s), computed as Y1 - Y0 %*% w
E0, the donor unit effects, computed in the same way but using the permutation test's weights.
ATE1, the estimated ATE of the treated unit(s)
ATE0, the estimated ATE of the donor units
Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American statistical Association, 105(490), 493-505.
pensynth(), cv_pensynth(), plot.pensynthtest(), stats::update()
set.seed(45) # simulate data with an effect of 0.8 SD dat <- simulate_data_synth(treatment_effect = .8) # fit a model fit <- pensynth(dat$X1, dat$X0, lambda = 1e-5) # Perform placebo test test <- placebo_test(fit, dat$Y1, dat$Y0) plot(test) abline(h = .8, lty = 2) legend("bottomright", lty = 2, legend = "true effect") # compute a pseudo p-value based on ATE in # the post-intervention time period ref_dist <- stats::ecdf(test$ATE0) 1 - ref_dist(test$ATE1)set.seed(45) # simulate data with an effect of 0.8 SD dat <- simulate_data_synth(treatment_effect = .8) # fit a model fit <- pensynth(dat$X1, dat$X0, lambda = 1e-5) # Perform placebo test test <- placebo_test(fit, dat$Y1, dat$Y0) plot(test) abline(h = .8, lty = 2) legend("bottomright", lty = 2, legend = "true effect") # compute a pseudo p-value based on ATE in # the post-intervention time period ref_dist <- stats::ecdf(test$ATE0) 1 - ref_dist(test$ATE1)
Displays a mean squared error curve and weights curve as a function of lambda, the penalization parameter.
## S3 method for class 'cvpensynth' plot(x, treated_unit = 1, ...)## S3 method for class 'cvpensynth' plot(x, treated_unit = 1, ...)
x |
a |
treated_unit |
|
... |
additional arguments passed to |
No return value, called for side effects
Plotting the reference distribution and the estimated treatement effect for the treated unit for the pensynth permutation test.
## S3 method for class 'pensynthtest' plot(x, treated_unit = 1, ...)## S3 method for class 'pensynthtest' plot(x, treated_unit = 1, ...)
x |
a |
treated_unit |
|
... |
additional parameters passed to |
No return value, called for side effects
Matrix multiplies the values in newdata by the unit weights
extracted from the cvpensynth object to produce predicted
values.
## S3 method for class 'cvpensynth' predict(object, newdata, lambda, ...)## S3 method for class 'cvpensynth' predict(object, newdata, lambda, ...)
object |
a fitted cvpensynth model |
newdata |
N_values * N_donors matrix of values for the donor units. |
lambda |
desired lambda value (defaults to optimal lambda) |
... |
ignored |
For a chosen lambda that is not in the list of tested lambdas in the cvpensynth object, the closest lambda (on the log scale) will be chosen.
a matrix (column vector) of predicted values
Matrix multiplies the values in newdata by the unit weights
extracted from the pensynth object to produce predicted
values.
## S3 method for class 'pensynth' predict(object, newdata, ...)## S3 method for class 'pensynth' predict(object, newdata, ...)
object |
a fitted cvpensynth model |
newdata |
N_values * N_donors matrix of values for the donor units. |
... |
ignored |
a matrix (column vector) of predicted values
Print cvpensynth model
## S3 method for class 'cvpensynth' print(x, ...)## S3 method for class 'cvpensynth' print(x, ...)
x |
a cvpensynth object |
... |
ignored |
the cvpensynth object, invisibly
Print pensynth model
## S3 method for class 'pensynth' print(x, ...)## S3 method for class 'pensynth' print(x, ...)
x |
a pensynth object |
... |
ignored |
the pensynth object, invisibly
This function simulates data according to a latent factor model:
Simulate time-varying latent factors, which are the same for all units
Simulate time-invariant factor loadings, separately for each donor unit
Create sparse unit weights for each treated unit
Compute the loadings for the treated units as donor-unit loadings * weights
Simulate observed outcome time-series as factors * loadings + error
Do the same for each covariate, holding loadings equal. Average across pre-intervention timepoints.
simulate_data_factor( N_donor = 50, N_treated = 1, N_nonzero = 4, N_covar = 5, N_pre = 12, N_post = 6, N_factors = 3, treatment_effect = 1, sd_factors = sqrt(2)/2, ar1_factors = 0.8, sd_loadings = sqrt(2)/2, sd_errors = 0.5, covar_means = TRUE )simulate_data_factor( N_donor = 50, N_treated = 1, N_nonzero = 4, N_covar = 5, N_pre = 12, N_post = 6, N_factors = 3, treatment_effect = 1, sd_factors = sqrt(2)/2, ar1_factors = 0.8, sd_loadings = sqrt(2)/2, sd_errors = 0.5, covar_means = TRUE )
N_donor |
number of donors |
N_treated |
number of treated units |
N_nonzero |
number of true nonzero weights |
N_covar |
number of covariates |
N_pre |
number of pre-intervention timepoints |
N_post |
number of post-intervention timepoints |
N_factors |
number of latent factors to simulate |
treatment_effect |
the size of the true treatment effect |
sd_factors |
the standard deviation of the (unit-invariant, time-varying) factors |
ar1_factors |
autoregressive effect of the factors |
sd_loadings |
the standard deviation of the (time-invariant) factor loadings |
sd_errors |
the standard deviation of the independent errors |
covar_means |
whether to average the covariates across the pre-intervention times (experimental) |
Note that treatment effect can be a single number, but it may also be a vector of length N_post, indicating the effect size at each post-intervention measurement. occasion. It may also be a matrix of size N_post by N_treated.
Standard values of sd_factors, sd_loadings, and sd_errors have been chosen such that the observed variables have expected variance of 1.
A list with the following elements
W the true unit weights
X0 the donor unit covariates
X1 the treated unit covariates
Z0 the donor unit pre-intervention outcomes
Z1 the treated unit pre-intervention outcomes
Y0 the donor unit post-intervention outcomes
Y1 the treated unit post-intervention outcomes
pensynth(), cv_pensynth(), placebo_test(), simulate_data_synth()
# simulate data with an effect of 0.8 SD dat <- simulate_data_factor(N_treated = 3) plot( NA, ylim = c(-5, 5), xlim = c(1, 18), main = "Simulated data", ylab = "Outcome value", xlab = "Timepoint" ) for (n in 1:ncol(dat$Z0)) lines(1:18, c(dat$Z0[, n], dat$Y0[, n]), col = "grey") for (n in 1:ncol(dat$Z1)) { lines(1:18, c(dat$Z1[, n], dat$Y1[, n]), lwd = 2, col = n) lines(1:18, (rbind(dat$Z0, dat$Y0) %*% dat$W)[,n], lty = 2, lwd = 2, col = n) } abline(v = nrow(dat$Z1) + 0.5, lty = 3) legend( x = "bottomleft", legend = c( "Donor units", "Treated unit", "Synth. control" ), lty = c(1, 1, 2), lwd = c(1, 2, 2), col = c("grey", "black", "black") ) text(nrow(dat$Z1) + 0.5, -5, "Intervention\ntimepoint", pos = 4, font = 3)# simulate data with an effect of 0.8 SD dat <- simulate_data_factor(N_treated = 3) plot( NA, ylim = c(-5, 5), xlim = c(1, 18), main = "Simulated data", ylab = "Outcome value", xlab = "Timepoint" ) for (n in 1:ncol(dat$Z0)) lines(1:18, c(dat$Z0[, n], dat$Y0[, n]), col = "grey") for (n in 1:ncol(dat$Z1)) { lines(1:18, c(dat$Z1[, n], dat$Y1[, n]), lwd = 2, col = n) lines(1:18, (rbind(dat$Z0, dat$Y0) %*% dat$W)[,n], lty = 2, lwd = 2, col = n) } abline(v = nrow(dat$Z1) + 0.5, lty = 3) legend( x = "bottomleft", legend = c( "Donor units", "Treated unit", "Synth. control" ), lty = c(1, 1, 2), lwd = c(1, 2, 2), col = c("grey", "black", "black") ) text(nrow(dat$Z1) + 0.5, -5, "Intervention\ntimepoint", pos = 4, font = 3)
This function simulates a basic form of synthetic control data, mainly for testing purposes.
simulate_data_synth( N_donor = 50, N_treated = 1, N_covar = 5, N_pre = 12, N_post = 6, N_nonzero = 4, treatment_effect = 1, sd_resid_X = 0.1, sd_resid_ZY = 0.1, ar1_outcome = 0.8 ) simulate_data( N_donor = 50, N_treated = 1, N_covar = 5, N_pre = 12, N_post = 6, N_nonzero = 4, treatment_effect = 1, sd_resid_X = 0.1, sd_resid_ZY = 0.1, ar1_outcome = 0.8 )simulate_data_synth( N_donor = 50, N_treated = 1, N_covar = 5, N_pre = 12, N_post = 6, N_nonzero = 4, treatment_effect = 1, sd_resid_X = 0.1, sd_resid_ZY = 0.1, ar1_outcome = 0.8 ) simulate_data( N_donor = 50, N_treated = 1, N_covar = 5, N_pre = 12, N_post = 6, N_nonzero = 4, treatment_effect = 1, sd_resid_X = 0.1, sd_resid_ZY = 0.1, ar1_outcome = 0.8 )
N_donor |
number of donors |
N_treated |
number of treated units |
N_covar |
number of covariates |
N_pre |
number of pre-intervention timepoints |
N_post |
number of post-intervention timepoints |
N_nonzero |
number of true nonzero weights |
treatment_effect |
the size of the true treatment effect |
sd_resid_X |
the residual standard deviation of X1 |
sd_resid_ZY |
the residual standard deviation of Z1 and Y1 |
ar1_outcome |
autoregressive effect of the outcome |
Note that treatment effect can be a single number, but it may also be a vector of length N_post, indicating the effect size at each post-intervention measurement. occasion. It may also be a matrix of size N_post by N_treated.
A list with the following elements
W the true unit weights
X0 the donor unit covariates
X1 the treated unit covariates
Z0 the donor unit pre-intervention outcomes
Z1 the treated unit pre-intervention outcomes
Y0 the donor unit post-intervention outcomes
Y1 the treated unit post-intervention outcomes
pensynth(), cv_pensynth(), placebo_test(), simulate_data_factor()
# simulate data with an effect of 0.8 SD dat <- simulate_data_synth(treatment_effect = 0.8) plot( NA, ylim = c(-3, 3), xlim = c(1, 18), main = "Simulated data", ylab = "Outcome value", xlab = "Timepoint" ) for (n in 1:ncol(dat$Z0)) lines(1:18, c(dat$Z0[, n], dat$Y0[, n]), col = "grey") lines(1:18, c(dat$Z1, dat$Y1), lwd = 2) lines(1:18, rbind(dat$Z0, dat$Y0) %*% dat$W, lty = 2, lwd = 2) abline(v = length(dat$Z1) + 0.5, lty = 3) legend( x = "bottomleft", legend = c( "Donor units", "Treated unit", "Synth. control" ), lty = c(1, 1, 2), lwd = c(1, 2, 2), col = c("grey", "black", "black") ) text(length(dat$Z1) + 0.5, -3, "Intervention\ntimepoint", pos = 4, font = 3)# simulate data with an effect of 0.8 SD dat <- simulate_data_synth(treatment_effect = 0.8) plot( NA, ylim = c(-3, 3), xlim = c(1, 18), main = "Simulated data", ylab = "Outcome value", xlab = "Timepoint" ) for (n in 1:ncol(dat$Z0)) lines(1:18, c(dat$Z0[, n], dat$Y0[, n]), col = "grey") lines(1:18, c(dat$Z1, dat$Y1), lwd = 2) lines(1:18, rbind(dat$Z0, dat$Y0) %*% dat$W, lty = 2, lwd = 2) abline(v = length(dat$Z1) + 0.5, lty = 3) legend( x = "bottomleft", legend = c( "Donor units", "Treated unit", "Synth. control" ), lty = c(1, 1, 2), lwd = c(1, 2, 2), col = c("grey", "black", "black") ) text(length(dat$Z1) + 0.5, -3, "Intervention\ntimepoint", pos = 4, font = 3)