When economists and social scientists want to understand the effect of a policy change or intervention, the difference-in-differences (DiD) method is a popular tool. But as the use of DiD has grown more sophisticated—often involving staggered policy rollouts and complex treatment timing—so too have the challenges. Two of the thorniest issues are anticipation (when subjects react to a policy before it actually happens) and violations of the parallel trends assumption (when treated and control groups would not have followed the same path absent the intervention). Jointly analyzing sensitivity to these problems is crucial for credible inference, especially as newer research reveals just how much violation in one can muddle the other. Let’s dig deep into how researchers can approach this analysis, drawing on insights from leading econometric research.
Short answer: Sensitivity to anticipation and parallel trends violations in difference-in-differences studies can be jointly analyzed by carefully decomposing the DiD estimator in settings with variation in treatment timing, using new balance tests and diagnostic tools. This involves breaking down treatment effects across subgroups and time periods, checking for pre-trends and early responses, and employing specification decompositions to identify and mitigate bias from both anticipation and non-parallel trends, as described in Goodman-Bacon’s influential work at nber.org.
Understanding the Problem: Anticipation and Parallel Trends
The traditional DiD framework is built on a simple idea: compare how outcomes change over time for a group exposed to a treatment (like a new law) and a group that isn’t. The critical assumption is that, in the absence of treatment, both groups would have followed “parallel trends”—their outcomes would have changed in the same way. If this assumption holds, the difference in outcome changes between the groups isolates the treatment effect.
However, two major complications often arise. First, anticipation: if people know a policy is coming, they may alter their behavior before it takes effect, contaminating the pre-treatment period. Second, parallel trends violations: if the groups differ systematically in how their outcomes evolve over time for reasons unrelated to the treatment, the DiD estimate will be biased.
As highlighted in Goodman-Bacon’s research at nber.org, when treatments are introduced at different times across units—a common scenario in modern policy analysis—the standard DiD estimator becomes a “weighted average of all possible two-group/two-period DiD estimators in the data.” This means some comparisons are between units treated earlier and those treated later, and some even compare already-treated units to newly treated ones. This complexity can amplify sensitivity to both anticipation and parallel trends problems.
Diagnosing and Decomposing the DiD Estimator
To jointly analyze sensitivity to anticipation and parallel trends violations, it is essential to understand what the DiD estimator is actually averaging. Goodman-Bacon (nber.org) shows that in settings with staggered treatment timing, the estimator is a composite of many simple DiDs, some of which may be more vulnerable to bias than others.
For example, imagine a policy is implemented in California in 2010 and in Texas in 2012, with other states remaining untreated. The DiD estimator will include comparisons of California versus untreated states (2010–2012), Texas versus untreated states (2012 onward), and, crucially, Texas versus California after 2012. If Texas residents anticipated the policy and changed behavior in 2011, or if California and Texas had diverging trends before treatment, the resulting biases get baked into the overall estimate. As Goodman-Bacon notes, “it is a weighted average of all possible two-group/two-period DD estimators in the data” (nber.org), so any bias in a subgroup contaminates the final result.
This decomposition allows researchers to probe: which sub-comparisons drive the result? Are some especially sensitive to anticipation (e.g., periods just before treatment)? Are some more likely to violate parallel trends (e.g., states with divergent economic trajectories)?
Testing for Pre-Trends and Anticipation
A foundational diagnostic is the “balance test” or “pre-trends test.” By plotting or regressing outcomes in the pre-treatment period, researchers can visually and statistically check whether treated and control units were evolving similarly before the policy. If they weren’t, that’s a red flag for parallel trends violations.
But anticipation complicates this test. Suppose people respond to news of a policy before it starts. In that case, the pre-treatment period may already reflect treatment effects, making the pre-trend appear non-parallel even if, absent anticipation, it would have been parallel. Goodman-Bacon proposes a “new balance test derived from a unified definition of common trends” (nber.org) to address this. This test helps distinguish between genuine parallel trends violations and pre-trend contamination due to anticipation.
Researchers often use event-study plots to visualize dynamic treatment effects—mapping outcome changes at various leads and lags relative to treatment. If outcomes jump before the actual treatment date, that suggests anticipation. If trends diverge long before the policy is even announced, that hints at deeper non-parallelism.
Specification Decomposition: Disentangling the Sources of Bias
To further isolate the effects of anticipation and parallel trends violations, researchers can compare alternative model specifications. Goodman-Bacon shows how to “decompose the difference between two specifications” (nber.org), such as models that include or exclude certain units, add unit-specific trends, or disaggregate time fixed effects.
For instance, including unit-specific time trends can help absorb systematic non-parallelism, but may also soak up some of the treatment effect if anticipation is present. Dropping already-treated units from the control group can reduce contamination from anticipation, but may increase variance or introduce selection bias. By comparing results across these specifications, researchers can gauge how sensitive their estimates are to each source of bias.
A Unified Analytical Approach
In practice, a joint sensitivity analysis might proceed as follows: First, decompose the DiD estimator into its constituent two-group/two-period contrasts, as Goodman-Bacon recommends. Next, check each for evidence of pre-trends and anticipation—both visually (via event studies) and statistically (using balance tests). Then, estimate alternative specifications that address each concern (e.g., include unit trends, drop early-treated units, lag the treatment variable to allow for anticipation). Finally, compare these results to the original estimate to see how much they shift.
If the estimated treatment effect changes substantially when accounting for anticipation or when adjusting for pre-trends, that signals sensitivity to these violations. If results are robust across decompositions, the evidence for a true causal effect is stronger.
A Real-World Example
Consider a study of Medicaid expansions across US states, with some states expanding in 2014 and others later. Goodman-Bacon’s framework would have researchers break down the overall DiD estimate into sub-comparisons: early expanders versus non-expanders, late expanders versus non-expanders, and late expanders versus early expanders. By plotting pre-treatment trends and treatment effect dynamics for each, researchers can spot if, say, late adopters started increasing Medicaid enrollment even before the official expansion (anticipation) or if certain states had rising enrollment for unrelated reasons (non-parallel trends). Specification checks—like adding state-specific trends or restricting the sample—can then be used to see if these issues meaningfully change the estimated effect.
Key Takeaways from the Literature
Several concrete insights stand out from the literature reviewed on nber.org. First, the DiD estimator in staggered adoption settings is a “weighted average” of many simple DiDs, so any bias in a subgroup can affect the whole. Second, anticipation and parallel trends violations can interact: anticipation can make pre-trends appear non-parallel, while non-parallelism can mimic anticipation effects. Third, new diagnostic tools, like Goodman-Bacon’s balance test, can help distinguish these issues. Fourth, decomposing the estimator and comparing model specifications are practical ways to jointly analyze sensitivity.
While the other source domains (such as sciencedirect.com, aeaweb.org, chicagobooth.edu, and economics.mit.edu) did not provide detailed methodological content for this topic in the excerpts, the consensus in the contemporary econometric literature—centered on the work cited above—is clear: joint sensitivity analysis is both possible and necessary, especially as empirical work moves beyond simple two-group, two-period settings.
Final Thoughts
In sum, analyzing sensitivity to anticipation and violations of the parallel trends assumption in DiD studies with varying treatment timing involves a careful, multi-step process. Decompose the estimator, use dynamic and subgroup-specific diagnostics, and compare alternative specifications to pinpoint where bias might lurk. As Goodman-Bacon’s research at nber.org makes clear, only by confronting both anticipation and parallel trends head-on—with the right tools and transparency—can researchers draw credible causal inferences from DiD designs.