1 / 12

Regression in a Causal Framework

Regression in a Causal Framework. Yair Ghitza & Laura Paler Presentation for Causal Inference Reading Group 6 October 2008. Main Questions for Today:. When is regression valid for causal inference? When to use regression vs. matching vs. weighted regression?

cahil
Download Presentation

Regression in a Causal Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression in a Causal Framework Yair Ghitza & Laura Paler Presentation for Causal Inference Reading Group 6 October 2008

  2. Main Questions for Today: • When is regression valid for causal inference? • When to use regression vs. matching vs. weighted regression? • Both matching and regression assume that all X’s that affect both treatment assignment and outcomes are accounted for. • All are highly dependent on correct specifications of the propensity score. • Matching can outperform weighting and adjustment if the true regression model is not known.

  3. Traditional Approach to Regression • Standard regression equation: Where: D is treatment indicator and δ is the ATE. • Classical assumptions hold that δ is an unbiased estimator of D when E[ε|D]=0 • Commonly, E[ε|D]≠0 when there is an omitted variable (or measurement error). • Solution: Control for all variables, X, related to both D and Y: such that E[ε*|D]=0.

  4. Regression in a Potential Outcomes Framework

  5. Lessons: • ε* dependent on how X is specified, impossible to say for certain whether E[ε*|D]=0. • δ in a regression equation is the naïve estimator: • The error term in the potential outcomes framework clearly points to two types of bias. OLS evokes omitted variable bias but potential outcomes evokes selection bias. • δ is a fixed parameter that masks heterogeneity. If there is individual level heterogeneity, then OLS is a conditional –variance-weighted estimator. But what does this mean??

  6. Revisiting Matching • Matching works when S is fully flexible (all S are accounted for). • In matching, the difference in outcomes is weighted for each strata by the probability of being in the treatment group.

  7. OLS As Conditional Variance Weighting • OLS does not work when S is fully flexible but only when it is saturated (each level of S interacted with D). • When only one parameter is calculated for δ, then the ATE estimator is: • What is this new term Var[D|S=s]? It equals p(1-p), where p is the probability D=1. That term is maximized when p=.50.

  8. The p Parabola

  9. Cond. Variance Weighting Examples

  10. VISUAL OF CONDITIONAL VARIANCE WEIGHTING • 1000 simulations • P(S=s) = {0.44, 0.24, 0.32} (like in the book) • The variance of P(D=1|S=s) goes from 0 to 0.25 • Treatment level is increasingly heterogenous across strata • Estimate treatment effect using three models: • Linear model, fully specified model, saturated model

  11. Comparing Regression, Matching, Weighting, and Pre-Processing • OLS is valid for causal inference when E[ε|D]=0, but this is very hard to prove conclusively. • Options: • Biased OLS • Fully saturated model (all strata & interactions) • Matching • Weighted least squares using p-scores • Pre-processing data (regression with matched data) • All of these methods are (at least somewhat) dependent on correct specifications of equations: • Propensity score equation • DGP equation

  12. Simulation Results True Equations: (Simple DGP) (Heterogenous DGP)

More Related