130 likes | 344 Views
Regression in a Causal Framework. Yair Ghitza & Laura Paler Presentation for Causal Inference Reading Group 6 October 2008. Main Questions for Today:. When is regression valid for causal inference? When to use regression vs. matching vs. weighted regression?
E N D
Regression in a Causal Framework Yair Ghitza & Laura Paler Presentation for Causal Inference Reading Group 6 October 2008
Main Questions for Today: • When is regression valid for causal inference? • When to use regression vs. matching vs. weighted regression? • Both matching and regression assume that all X’s that affect both treatment assignment and outcomes are accounted for. • All are highly dependent on correct specifications of the propensity score. • Matching can outperform weighting and adjustment if the true regression model is not known.
Traditional Approach to Regression • Standard regression equation: Where: D is treatment indicator and δ is the ATE. • Classical assumptions hold that δ is an unbiased estimator of D when E[ε|D]=0 • Commonly, E[ε|D]≠0 when there is an omitted variable (or measurement error). • Solution: Control for all variables, X, related to both D and Y: such that E[ε*|D]=0.
Lessons: • ε* dependent on how X is specified, impossible to say for certain whether E[ε*|D]=0. • δ in a regression equation is the naïve estimator: • The error term in the potential outcomes framework clearly points to two types of bias. OLS evokes omitted variable bias but potential outcomes evokes selection bias. • δ is a fixed parameter that masks heterogeneity. If there is individual level heterogeneity, then OLS is a conditional –variance-weighted estimator. But what does this mean??
Revisiting Matching • Matching works when S is fully flexible (all S are accounted for). • In matching, the difference in outcomes is weighted for each strata by the probability of being in the treatment group.
OLS As Conditional Variance Weighting • OLS does not work when S is fully flexible but only when it is saturated (each level of S interacted with D). • When only one parameter is calculated for δ, then the ATE estimator is: • What is this new term Var[D|S=s]? It equals p(1-p), where p is the probability D=1. That term is maximized when p=.50.
VISUAL OF CONDITIONAL VARIANCE WEIGHTING • 1000 simulations • P(S=s) = {0.44, 0.24, 0.32} (like in the book) • The variance of P(D=1|S=s) goes from 0 to 0.25 • Treatment level is increasingly heterogenous across strata • Estimate treatment effect using three models: • Linear model, fully specified model, saturated model
Comparing Regression, Matching, Weighting, and Pre-Processing • OLS is valid for causal inference when E[ε|D]=0, but this is very hard to prove conclusively. • Options: • Biased OLS • Fully saturated model (all strata & interactions) • Matching • Weighted least squares using p-scores • Pre-processing data (regression with matched data) • All of these methods are (at least somewhat) dependent on correct specifications of equations: • Propensity score equation • DGP equation
Simulation Results True Equations: (Simple DGP) (Heterogenous DGP)