Do you use diff-in-diff? Then this thread is for you.
You’re no dummy. You already know diverging trends in the pre-period can bias your results.
But I’m here to tell you about a TOTALLY DIFFERENT, SUPER SNEAKY kind of bias.
Friends, let’s talk regression to the mean. (1/N)
In a scenario that should be FINE for diff-in-diff, they get CRAZY HIGH Type I error rates.
After matching on pre-period variables (via propensity scores), things do indeed look fine. (3/N)
“WHY?” I wondered. This diff-in-diff study should be unbiased.
Diff-in-diff nets out baseline differences…right? (4/N)
Let’s dig into the simulation, shall we?
In their simulation, hospitals with higher-than-average performance in the pre-period are more likely to be in treatment and vice versa. (5/N)
Here’s what it looks like: treatment hospitals (yellow) have higher baseline performance than control (purple) hospitals. (6/N)
However, if we do ordinary diff-in-diff, the two groups regress back to their (common) mean in the post-period and we get a BIASED result.
What the heck? Baseline differences aren’t supposed to be a problem for diff-in-diff! (7/N)
So Ryan et al turn to matching.
Taking treatment hospitals and control hospitals with similar baseline performance, the pre-period difference disappears, so there’s no difference in pre or post. (8/N)
Fast-forward to 2017. @jamie_daw discovers that matched diff-in-diff might have some…problems.
In her subtly different simulation, Jamie generates treatment and control data from DIFFERENT populations.
Suppose they’re exactly as far apart as the Ryan et al. case. (10/N)
Now the ordinary, unmatched diff-in-diff is UNBIASED.
These two groups are not regressing back anywhere. Their mean difference is PERMANENT. (11/N)
So what happens if we match to make the pre-period difference go away?
It REAPPEARS in the post-period, as the two groups regress back to their respective means.
Matching INTRODUCES bias into an otherwise totally fine diff-in-diff. (12/N)
Matching FIXES bias in the Ryan et al scenario.
Mathcing CAUSES bias in the Daw & Hatfield scenario.
And in NEITHER case are there any violations of parallel pre-trends. (13/N)
Side note: In our paper on this, Jamie and I also talk about how parallel trend problems may not be fixed by matching either dx.doi.org/10.1111/1475-6… (14/N)
So where does this leave us? Be very careful with diff-in-diff.
Causal inference is HARD. You have to think about causal MECHANISMS.
What CAUSED the baseline differences between treatment and control? Is it likely to PERSIST into the post-period? (15/N)
@SylvainCF noticed the problem dx.doi.org/10.1016/j.jeco… and worked out the theory twitter.com/SylvainCF/stat…
@Lizstuartdc @Michael_Chernew @colleenlbarry et al. developed symmetric PS weighting for diff-in-diff that avoids the problem dx.doi.org/10.1007/s10742… (17/N)
Thank you for this. Wondering what do we consider a good test of parallel tends? Have seen papers test to see if pre intervention point estimates are different and concluding trends are same even if there's a clear pattern in pre trend point estimates increasing or decreasing.