260 likes | 357 Views
Review of Identifying Causal Effects. Methods of Economic Investigation Lecture 13. Last Term. Classical OLS has 5 main assumptions A1. Full Rank: X is a T x k matrix with rank p(X) = k≤n A2. Linearity: y = X β + ε where E( ε ) = 0
E N D
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13
Last Term • Classical OLS has 5 main assumptions • A1. Full Rank: X is a T x k matrix with rank p(X)= k≤n • A2. Linearity: y = Xβ + ε where E(ε ) = 0 • A3. X is exogenous with respect to ε , i.e. E(ε | X) = 0 • Somewhat weaker condition, X is uncorrelated with ε so that E(εX)=0 • A4. Homoskedasticiy: E(ε ε’) = σ2IT • A5. Normality ε~N(0, σ2IT)
Relaxing the assumptions • Even in finite samples, with these assumption linear regression is BLUE • Last term you talked about some of the consequences of relaxing some of these assumptions • Can rely on large sample properties to get around some of the problems (e.g. A5) • Can construct more robust estimates which are ok in these large sample (e.g. A4)
This term • The biggest problem in estimation is A3 • We call this the “Conditional Independence Assumption” or CIA • Our estimates become inconsistent—which means we cannot rely on large samples to fix the problem • There are lots of ways violations of A3 can happen
CIA is violated, now what? • We need to figure out what is generating the correlations • Measurement error • Omitted variables • Selection on unobservables • Simultaneity of determination • Correlations across different periods
How do we figure out what’s wrong? • We put this in the context of a “program evaluation” • In truth, may not be any “program” per se • Think of variable of interest as a “treatment” • For simplicity, we now talk about the variable of interest as an indicator variable that can be zero or one • In practice can be continuous • Use derivatives rather than differences to calculate differences/changes
The Imaginary Experimental Ideal • Pretend you could run an imaginary experiment • Not bounded by reality • Everything observable • How would you construct a test to isolate the effect of your variable of interest T onthe outcome of interest Y
Partioning the World • Two groups of people in the world • People who got the treatment (so T=1) • People who did not get the treatment (so T=0) • To think of a counterfactual outcome we need to know what would have happened in the absence of the experiment
The Road Not Taken… • We Imagine 2 states of the world: one where someone gets T and one where that same person does not • Now let’s define our usual notation Individual A Doesn’t Get T Gets T Y0A Y1A
Our Gold Standard • If we could observe both these states of the world we could know what would have happened in the absence of the treatment • The ONLY DIFFERENCE is the treatment so we know any difference in the Y’s must be caused by the treatment • This is our Average Treatment Effect E[Y1A – Y0A]
Back to the real world • Sadly, we do not observe the true counterfactual • What do we observe? • Y1 for all the people in the treatment group (let them all be A’s) • Y0 for all the people in the control group (let them all be B’s)
The Difference Estimate • What if we just difference between treatment and control? That is what if we did: E[Y1 | T = 1] – E[Y0 | T = 0] • NOTE: These are now CONDITIONAL expectation because we don’t observe the treatment for the control and vice versa
Decomposing the Difference Estimate • What if we rewrite our difference estimate so that: E[Y1A | T = 1] – E[Y0B | T = 0] = E[Y1A | T = 1] - E[Y0A | T = 1] + {E[Y0A | T = 1] – E[Y0B | T = 0]} = TOT + Selection Bias
ATE vs. TOT • Let’s look at the two definitions: TOT: E[Y1A – Y0A | T = 1] ATE: E[Y1A – Y0A] Why might these differ even if SB=0? • Heterogeneous Treatment Effects • This means there may be an idiosyncratic individual component to the treatment effect • Not the same as selection, more a function of the actual effect
ATE vs. TOT - 2 • Suppose Y1A = μ1 + ξ1 and Y0A = μ0 + ξ0 • Then we can rewrite • TOT= E[Y1A – Y0A|T=1] = E[μ1 – μ0|T=1] + E[ξ1 – ξ0|T=1] • ATE = E[Y1A – Y0A] = E[μ1 – μ0] + E[ξ1 – ξ0] = {E[μ1 – μ0| T=1]+ E[ξ1 – ξ0|T=1]}*Pr(T=1) + {E[μ1 – μ0| T=0]+ E[ξ1 – ξ0|T=0]}*Pr(T=0) • So the ATE will be a weighted average of TOT and a treatment effect for the Control group
Visual Representation of Difference Not Assigned Treatment (T=0) Assigned Treatment (T=1) A B TOT ATE Y0B Y1B Y1A Y0A
MY CARELESS NOTATION • Exercise 2 • ATE defined as: ATE = E( Y1i –Y0i | Ti =1) • This is really ATET • ATET=ATE if E( Y1i –Y0i | Ti =1) = E( Y1i –Y0i | Ti =0) • No heterogenity in treatment effects • In general, these are not the same
Why do we care about TOT? • In the case of an experiment, we can get an estimate of TOT (may not be ATE) • Why? • We observe: E[Y1A | T = 1] – E[Y0B | T=0] • This can be decomposed into two parts: E[Y1A – Y0A | T = 1] + E[Y0A | T = 1] – E[Y0B | T=0] • If E[Y0A | T = 1] = E[Y0B | T=0] then the observed difference in outcomes is our estimate of TOT!
That’s why experiments are good! • If you have an experiment which is randomly assigned with no compliance issues then we can estimate TOT • If there are compliance issues, then we estimate ITT E[Y1A | T = 1] – E[Y0B | T=0]
TOT vs. ITT • ITT may not be the same as TOT (and thus in the case of random assignment not the same as ATE) because of compliance: • DEVIATION FROM PREVIOUS NOTATION: • Before we have assumed that if T= 1 then you were both assigned to and received treatment • Now we need two separate things: T the assignment to treatment and R receipt of treatment
Visual Representation of Difference Not Assigned Treatment (T=0) Assigned Treatment (T=1) A B R=1 R=1 R=0 R=0 Compare A (orange) to B (blue) = ITT Compare R=1 (solid) to R=0 (striped) = TOT + SB Compare A,R=1 (orange solid) to B,R=0 (blue striped) = LATE
Compliance Issues • In the case where R ≠ T, rewrite the observed difference in outcomes as E[Y1 | R = 1] – E[Y0 | R=0] = E[YA | R = 1, T = 1] *Pr[T=1 | R=1] + E[YB | R = 1, T = 0] *Pr[T=0 | R=1] – E[YB | R = 0, T = 0] *Pr[T=0 | R=0] – E[YA | R = 0, T = 1] *Pr[T=1 | R=0] (Treatment Group Compliers) (Always Takers) (Control Group Compliers) (Never Takers)
Imagine non-compliance is symmetric • Rewrite with Pr[T=1 | R=0] = Pr[T=0 | R=1] = p E[Y1 | R = 1] – E[Y0 | R=0] = {E[YA | R = 1, T = 1]– E[YB | R = 0, T = 0] } (1 – p) +{ E[YB | R = 1, T = 0]– E[YB | R = 0, T = 1]}p = (TOT + SB)*(1 – p) + (AT – NT) p If SB = 0 (no selection bias, i.e. among compliers, the counterfactual for the treatment group is the same) then this is a the weighted avg between the TOT and the AT/NT bias.
Roadmap of the course so far: Hypothetical counterfactual difference Perfect Compliance Experiment TOT Non-experimental Imperfect Compliance ITT Fixed Differences between Groups Groups with Parallel Trends Groups with similar characteristics Fixed Effect Difference-in-Differences Matching Methods TOT TOT/ITT TOT
What we’ve done so far… • Ways to define a ‘control group’ • Fixed Effect • Individuals within a group, on average the same • Attribute any within group difference to treatment • Difference-in-Differences • Assume: Fixed Differences over time • Attribute any change in trend to treatment • Propensity Score Matching • Assume: Treatment, conditional on observables, is as if randomly assigned • Attribute any difference in outcomes to treatment
Next time… • Instrumental Variables • What Are They • What about LATE? • How to Estimate