Estimation Methods

Overview

We provide two estimators that correctly handle bad controls in DiD. Both require the same core assumption — Covariate Unconfoundedness — but differ in how they estimate the nuisance functions.

Method	Nuisance estimation	Robustness	When to use
Imputation	Parametric (OLS)	Single robustness	Simple settings, linear relationships
Doubly Robust ML	OLS outcome + random forest propensity score	Double robustness	When you want robustness to model misspecification

Both are implemented in the badcontrols R package through bc_att_gt().

The key assumption

Covariate Unconfoundedness (Assumption 5 in the paper):

\[ X_t(0) \perp\!\!\!\perp D \mid X_{t-1}, W, Z \]

where:

$X_t(0)$ is the counterfactual covariate (what $X_t$ would be without treatment)
$X_{t-1}$ is the pre-treatment covariate
$W$ is an auxiliary variable (typically $Y_{t-1}$, the lagged outcome)
$Z$ is a vector of time-invariant covariates

This says: conditional on what we observe pre-treatment, the counterfactual $X_t(0)$ is independent of treatment status. The control group’s $X_t$ model can be used to predict the treated group’s counterfactual.

The role of W: the auxiliary variable

The auxiliary variable $W$ deserves special attention. Why isn’t $(X_{t-1}, Z)$ enough?

As the DAG shows, $W$ is a pre-treatment confounder of both treatment $D$ and the post-treatment covariate $X_t$:

\[ D \leftarrow W \rightarrow X_t(0) \]

If you only condition on $(X_{t-1}, Z)$, the confounding path through $W$ remains open. The imputation model would systematically mispredict $X_t(0)$ for treated units because the treated and control groups differ in their $W$ distributions.

Conditioning on $W$ closes the backdoor path and ensures the imputation is unbiased. In the badcontrols package, $W$ is included via the xformla argument. The imputation model becomes:

\[ X_t = f(X_{t-1}, W, Z) + \varepsilon \]

When is W not needed?

If Simple Covariate Unconfoundedness holds — $X_t(0) \perp D \mid X_{t-1}, Z$ — then $W$ is not required. This is a stronger condition that says $(X_{t-1}, Z)$ alone are sufficient. You can test sensitivity to this by running the estimator with and without $W$.

What can serve as W?

Any pre-treatment variable that confounds treatment selection and covariate evolution. Examples:

Pre-treatment covariates observed in the data (e.g., baseline health, prior earnings)
Lagged outcomes $Y_{t-1}$ (set lagged_outcome_cov = TRUE)
Other lagged covariates measured before treatment

The key requirement is that $W$ is (i) not affected by treatment and (ii) closes the confounding path between $D$ and $X_t(0)$.

Method 1: Imputation

Algorithm

Step 1: Impute $X_t(0)$

Using the control group (where $X_t = X_t(0)$), estimate:

\[ X_t = \alpha + \beta_1 X_{t-1} + \beta_2 W + \beta_3 Z + \varepsilon \]

For treated units, predict $\hat{X}_t(0)$ using the estimated coefficients and their pre-treatment values.

Step 2: Run DiD with imputed covariate

Replace the observed $X_t$ with $\hat{X}_t(0)$ for treated units, then estimate ATT using standard DiD methods conditioning on $\hat{X}_t(0)$.

R code

library(badcontrols)

res_imp <- bc_att_gt(
  yname = "Y",
  gname = "G",
  tname = "period",
  idname = "id",
  data = sim$data,
  bad_control_formula = ~X,   # X is the bad control to impute
  xformla = ~Z + W,             # pre-treatment covariates (W is the auxiliary variable)
  est_method = "imputation"
)

extract_att(res_imp)

When it works well

Linear relationship between $X_t$ and $(X_{t-1}, W, Z)$
Moderate sample sizes
Fast computation

Method 2: Doubly Robust ML

The idea

The DR estimator builds on the imputation approach but adds a propensity score correction that provides double robustness. It combines three steps:

Impute $X_t(0)$ via OLS (same as Method 1)
Outcome regression: $\mathbb{E}[\Delta Y \mid \hat{X}_t(0), Z, D=0]$ via OLS
Generalized propensity score: $\Pr(D = 1 \mid X_{t-1}, W, Z)$ via random forests (cross-fitted)

The propensity score is estimated using Athey, Tibshirani, and Wager’s (2019) grf package — specifically, probability_forest with $K$-fold cross-fitting to avoid overfitting.

The ATT is then estimated using an AIPW (augmented inverse probability weighting) score:

\[ \widehat{ATT}_{DR} = \frac{1}{n_1}\sum_{D_i=1} \hat{\varepsilon}_i - \sum_{D_i=0} \hat{w}_i \, \hat{\varepsilon}_i \]

where $\hat{\varepsilon}_i = \Delta Y_i - \hat{m}(\hat{X}_{t,i}(0), Z_i)$ are the outcome regression residuals and $\hat{w}_i$ are Hajek-normalized inverse probability weights.

Double robustness

The AIPW score is consistent if either the outcome model or the propensity score is correctly specified:

If the outcome model is correct, the IPW correction averages to zero (no harm done)
If the propensity score is correct, it reweights the controls to match the treated covariate distribution, fixing any misspecification in the outcome model

This provides insurance against model misspecification — you get two chances to get it right.

Cross-fitting

The propensity score uses random forests, which are flexible but can overfit. Cross-fitting prevents this:

Split the sample into $K$ folds
For each fold, estimate the propensity score on the remaining $K-1$ folds
Predict on the held-out fold

This yields valid inference even with highly flexible ML estimators.

R code

library(badcontrols)

res_ml <- bc_att_gt(
  yname = "Y",
  gname = "G",
  tname = "period",
  idname = "id",
  data = sim$data,
  bad_control_formula = ~X,
  xformla = ~Z + W,
  est_method = "dr_ml"
)

extract_att(res_ml)

When it works well

When you want robustness to misspecification of the outcome model
When treatment selection depends on covariates in complex, nonlinear ways
Moderate to large sample sizes (the random forest PS needs data)

Requirements

The DR/ML method requires the grf package for random forests:

install.packages("grf")

Inference

Both methods produce group-time ATT estimates $\widehat{ATT}(g,t)$ for each treatment group $g$ and time period $t$. These are aggregated into:

Overall ATT: weighted average across all $(g,t)$ cells
Event study: ATT by event time (periods relative to treatment)
Group-specific ATT: ATT for each treatment cohort

Standard errors are computed via the multiplier bootstrap, using the influence function representation of the estimators.

# Overall ATT
extract_att(res)

# The pte infrastructure handles aggregation automatically
summary(res)

Choosing between methods

flowchart TD
    A[Bad control detected] --> B{Relationships linear?}
    B -->|Yes| C[Imputation]
    B -->|No / Unsure| D{Large sample?}
    D -->|Yes| E[DR/ML]
    D -->|No| F[Imputation with flexible formula]

In practice, we recommend running both methods and checking that results are similar. If they diverge substantially, investigate the source of the discrepancy.

--- title: "Estimation Methods" --- ## Overview We provide two estimators that correctly handle bad controls in DiD. Both require the same core assumption --- **Covariate Unconfoundedness** --- but differ in how they estimate the nuisance functions. | Method | Nuisance estimation | Robustness | When to use | |--------|-------------------|------------|-------------| | Imputation | Parametric (OLS) | Single robustness | Simple settings, linear relationships | | Doubly Robust ML | OLS outcome + random forest propensity score | Double robustness | When you want robustness to model misspecification | Both are implemented in the `badcontrols` R package through `bc_att_gt()`. ## The key assumption **Covariate Unconfoundedness (Assumption 5 in the paper):** $$ X_t(0) \perp\!\!\!\perp D \mid X_{t-1}, W, Z $$ where: - $X_t(0)$ is the counterfactual covariate (what $X_t$ would be without treatment) - $X_{t-1}$ is the pre-treatment covariate - $W$ is an auxiliary variable (typically $Y_{t-1}$, the lagged outcome) - $Z$ is a vector of time-invariant covariates This says: conditional on what we observe pre-treatment, the counterfactual $X_t(0)$ is independent of treatment status. The control group's $X_t$ model can be used to predict the treated group's counterfactual. ## The role of W: the auxiliary variable The auxiliary variable $W$ deserves special attention. Why isn't $(X_{t-1}, Z)$ enough? As the DAG shows, $W$ is a **pre-treatment confounder** of both treatment $D$ and the post-treatment covariate $X_t$: $$ D \leftarrow W \rightarrow X_t(0) $$ If you only condition on $(X_{t-1}, Z)$, the confounding path through $W$ remains open. The imputation model would systematically mispredict $X_t(0)$ for treated units because the treated and control groups differ in their $W$ distributions. Conditioning on $W$ **closes the backdoor path** and ensures the imputation is unbiased. In the `badcontrols` package, $W$ is included via the `xformla` argument. The imputation model becomes: $$ X_t = f(X_{t-1}, W, Z) + \varepsilon $$ ::: {.callout-note} ## When is W not needed? If **Simple Covariate Unconfoundedness** holds --- $X_t(0) \perp D \mid X_{t-1}, Z$ --- then $W$ is not required. This is a stronger condition that says $(X_{t-1}, Z)$ alone are sufficient. You can test sensitivity to this by running the estimator with and without $W$. ::: ::: {.callout-tip} ## What can serve as W? Any pre-treatment variable that confounds treatment selection and covariate evolution. Examples: - Pre-treatment covariates observed in the data (e.g., baseline health, prior earnings) - Lagged outcomes $Y_{t-1}$ (set `lagged_outcome_cov = TRUE`) - Other lagged covariates measured before treatment The key requirement is that $W$ is (i) not affected by treatment and (ii) closes the confounding path between $D$ and $X_t(0)$. ::: ## Method 1: Imputation ### Algorithm **Step 1: Impute $X_t(0)$** Using the control group (where $X_t = X_t(0)$), estimate: $$ X_t = \alpha + \beta_1 X_{t-1} + \beta_2 W + \beta_3 Z + \varepsilon $$ For treated units, predict $\hat{X}_t(0)$ using the estimated coefficients and their pre-treatment values. **Step 2: Run DiD with imputed covariate** Replace the observed $X_t$ with $\hat{X}_t(0)$ for treated units, then estimate ATT using standard DiD methods conditioning on $\hat{X}_t(0)$. ### R code ```r library(badcontrols) res_imp <- bc_att_gt( yname = "Y", gname = "G", tname = "period", idname = "id", data = sim$data, bad_control_formula = ~X, # X is the bad control to impute xformla = ~Z + W, # pre-treatment covariates (W is the auxiliary variable) est_method = "imputation" ) extract_att(res_imp) ``` ### When it works well - Linear relationship between $X_t$ and $(X_{t-1}, W, Z)$ - Moderate sample sizes - Fast computation ## Method 2: Doubly Robust ML ### The idea The DR estimator builds on the imputation approach but adds a **propensity score correction** that provides double robustness. It combines three steps: 1. **Impute $X_t(0)$** via OLS (same as Method 1) 2. **Outcome regression**: $\mathbb{E}[\Delta Y \mid \hat{X}_t(0), Z, D=0]$ via OLS 3. **Generalized propensity score**: $\Pr(D = 1 \mid X_{t-1}, W, Z)$ via random forests (cross-fitted) The propensity score is estimated using Athey, Tibshirani, and Wager's (2019) [`grf`](https://grf-labs.github.io/grf/) package --- specifically, `probability_forest` with $K$-fold cross-fitting to avoid overfitting. The ATT is then estimated using an AIPW (augmented inverse probability weighting) score: $$ \widehat{ATT}_{DR} = \frac{1}{n_1}\sum_{D_i=1} \hat{\varepsilon}_i - \sum_{D_i=0} \hat{w}_i \, \hat{\varepsilon}_i $$ where $\hat{\varepsilon}_i = \Delta Y_i - \hat{m}(\hat{X}_{t,i}(0), Z_i)$ are the outcome regression residuals and $\hat{w}_i$ are Hajek-normalized inverse probability weights. ### Double robustness The AIPW score is consistent if **either** the outcome model **or** the propensity score is correctly specified: - If the outcome model is correct, the IPW correction averages to zero (no harm done) - If the propensity score is correct, it reweights the controls to match the treated covariate distribution, fixing any misspecification in the outcome model This provides insurance against model misspecification --- you get two chances to get it right. ### Cross-fitting The propensity score uses random forests, which are flexible but can overfit. Cross-fitting prevents this: 1. Split the sample into $K$ folds 2. For each fold, estimate the propensity score on the remaining $K-1$ folds 3. Predict on the held-out fold This yields valid inference even with highly flexible ML estimators. ### R code ```r library(badcontrols) res_ml <- bc_att_gt( yname = "Y", gname = "G", tname = "period", idname = "id", data = sim$data, bad_control_formula = ~X, xformla = ~Z + W, est_method = "dr_ml" ) extract_att(res_ml) ``` ### When it works well - When you want robustness to misspecification of the outcome model - When treatment selection depends on covariates in complex, nonlinear ways - Moderate to large sample sizes (the random forest PS needs data) ### Requirements The DR/ML method requires the [`grf`](https://grf-labs.github.io/grf/) package for random forests: ```r install.packages("grf") ``` ## Inference Both methods produce group-time ATT estimates $\widehat{ATT}(g,t)$ for each treatment group $g$ and time period $t$. These are aggregated into: - **Overall ATT**: weighted average across all $(g,t)$ cells - **Event study**: ATT by event time (periods relative to treatment) - **Group-specific ATT**: ATT for each treatment cohort Standard errors are computed via the multiplier bootstrap, using the influence function representation of the estimators. ```r # Overall ATT extract_att(res) # The pte infrastructure handles aggregation automatically summary(res) ``` ## Choosing between methods ```{mermaid} %%| fig-width: 6 flowchart TD A[Bad control detected] --> B{Relationships linear?} B -->|Yes| C[Imputation] B -->|No / Unsure| D{Large sample?} D -->|Yes| E[DR/ML] D -->|No| F[Imputation with flexible formula] ``` In practice, we recommend running **both** methods and checking that results are similar. If they diverge substantially, investigate the source of the discrepancy.