Review of Identifying Causal Effects

Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13

Last Term • Classical OLS has 5 main assumptions • A1. Full Rank: X is a T x k matrix with rank p(X)= k≤n • A2. Linearity: y = Xβ + ε where E(ε ) = 0 • A3. X is exogenous with respect to ε , i.e. E(ε | X) = 0 • Somewhat weaker condition, X is uncorrelated with ε so that E(εX)=0 • A4. Homoskedasticiy: E(ε ε’) = σ2IT • A5. Normality ε~N(0, σ2IT)

Relaxing the assumptions • Even in finite samples, with these assumption linear regression is BLUE • Last term you talked about some of the consequences of relaxing some of these assumptions • Can rely on large sample properties to get around some of the problems (e.g. A5) • Can construct more robust estimates which are ok in these large sample (e.g. A4)

This term • The biggest problem in estimation is A3 • We call this the “Conditional Independence Assumption” or CIA • Our estimates become inconsistent—which means we cannot rely on large samples to fix the problem • There are lots of ways violations of A3 can happen

CIA is violated, now what? • We need to figure out what is generating the correlations • Measurement error • Omitted variables • Selection on unobservables • Simultaneity of determination • Correlations across different periods

How do we figure out what’s wrong? • We put this in the context of a “program evaluation” • In truth, may not be any “program” per se • Think of variable of interest as a “treatment” • For simplicity, we now talk about the variable of interest as an indicator variable that can be zero or one • In practice can be continuous • Use derivatives rather than differences to calculate differences/changes

The Imaginary Experimental Ideal • Pretend you could run an imaginary experiment • Not bounded by reality • Everything observable • How would you construct a test to isolate the effect of your variable of interest T onthe outcome of interest Y

Partioning the World • Two groups of people in the world • People who got the treatment (so T=1) • People who did not get the treatment (so T=0) • To think of a counterfactual outcome we need to know what would have happened in the absence of the experiment

The Road Not Taken… • We Imagine 2 states of the world: one where someone gets T and one where that same person does not • Now let’s define our usual notation Individual A Doesn’t Get T Gets T Y0A Y1A

Our Gold Standard • If we could observe both these states of the world we could know what would have happened in the absence of the treatment • The ONLY DIFFERENCE is the treatment so we know any difference in the Y’s must be caused by the treatment • This is our Average Treatment Effect E[Y1A – Y0A]

Back to the real world • Sadly, we do not observe the true counterfactual • What do we observe? • Y1 for all the people in the treatment group (let them all be A’s) • Y0 for all the people in the control group (let them all be B’s)

The Difference Estimate • What if we just difference between treatment and control? That is what if we did: E[Y1 | T = 1] – E[Y0 | T = 0] • NOTE: These are now CONDITIONAL expectation because we don’t observe the treatment for the control and vice versa

Decomposing the Difference Estimate • What if we rewrite our difference estimate so that: E[Y1A | T = 1] – E[Y0B | T = 0] = E[Y1A | T = 1] - E[Y0A | T = 1] + {E[Y0A | T = 1] – E[Y0B | T = 0]} = TOT + Selection Bias

ATE vs. TOT • Let’s look at the two definitions: TOT: E[Y1A – Y0A | T = 1] ATE: E[Y1A – Y0A] Why might these differ even if SB=0? • Heterogeneous Treatment Effects • This means there may be an idiosyncratic individual component to the treatment effect • Not the same as selection, more a function of the actual effect

ATE vs. TOT - 2 • Suppose Y1A = μ1 + ξ1 and Y0A = μ0 + ξ0 • Then we can rewrite • TOT= E[Y1A – Y0A|T=1] = E[μ1 – μ0|T=1] + E[ξ1 – ξ0|T=1] • ATE = E[Y1A – Y0A] = E[μ1 – μ0] + E[ξ1 – ξ0] = {E[μ1 – μ0| T=1]+ E[ξ1 – ξ0|T=1]}*Pr(T=1) + {E[μ1 – μ0| T=0]+ E[ξ1 – ξ0|T=0]}*Pr(T=0) • So the ATE will be a weighted average of TOT and a treatment effect for the Control group

Visual Representation of Difference Not Assigned Treatment (T=0) Assigned Treatment (T=1) A B TOT ATE Y0B Y1B Y1A Y0A

MY CARELESS NOTATION • Exercise 2 • ATE defined as: ATE = E( Y1i –Y0i | Ti =1) • This is really ATET • ATET=ATE if E( Y1i –Y0i | Ti =1) = E( Y1i –Y0i | Ti =0) • No heterogenity in treatment effects • In general, these are not the same

Why do we care about TOT? • In the case of an experiment, we can get an estimate of TOT (may not be ATE) • Why? • We observe: E[Y1A | T = 1] – E[Y0B | T=0] • This can be decomposed into two parts: E[Y1A – Y0A | T = 1] + E[Y0A | T = 1] – E[Y0B | T=0] • If E[Y0A | T = 1] = E[Y0B | T=0] then the observed difference in outcomes is our estimate of TOT!

That’s why experiments are good! • If you have an experiment which is randomly assigned with no compliance issues then we can estimate TOT • If there are compliance issues, then we estimate ITT E[Y1A | T = 1] – E[Y0B | T=0]

TOT vs. ITT • ITT may not be the same as TOT (and thus in the case of random assignment not the same as ATE) because of compliance: • DEVIATION FROM PREVIOUS NOTATION: • Before we have assumed that if T= 1 then you were both assigned to and received treatment • Now we need two separate things: T the assignment to treatment and R receipt of treatment

Visual Representation of Difference Not Assigned Treatment (T=0) Assigned Treatment (T=1) A B R=1 R=1 R=0 R=0 Compare A (orange) to B (blue) = ITT Compare R=1 (solid) to R=0 (striped) = TOT + SB Compare A,R=1 (orange solid) to B,R=0 (blue striped) = LATE

Imagine non-compliance is symmetric • Rewrite with Pr[T=1 | R=0] = Pr[T=0 | R=1] = p E[Y1 | R = 1] – E[Y0 | R=0] = {E[YA | R = 1, T = 1]– E[YB | R = 0, T = 0] } (1 – p) +{ E[YB | R = 1, T = 0]– E[YB | R = 0, T = 1]}p = (TOT + SB)*(1 – p) + (AT – NT) p If SB = 0 (no selection bias, i.e. among compliers, the counterfactual for the treatment group is the same) then this is a the weighted avg between the TOT and the AT/NT bias.

Roadmap of the course so far: Hypothetical counterfactual difference Perfect Compliance Experiment TOT Non-experimental Imperfect Compliance ITT Fixed Differences between Groups Groups with Parallel Trends Groups with similar characteristics Fixed Effect Difference-in-Differences Matching Methods TOT TOT/ITT TOT

What we’ve done so far… • Ways to define a ‘control group’ • Fixed Effect • Individuals within a group, on average the same • Attribute any within group difference to treatment • Difference-in-Differences • Assume: Fixed Differences over time • Attribute any change in trend to treatment • Propensity Score Matching • Assume: Treatment, conditional on observables, is as if randomly assigned • Attribute any difference in outcomes to treatment

Next time… • Instrumental Variables • What Are They • What about LATE? • How to Estimate

Review of Identifying Causal Effects

Review of Identifying Causal Effects

Presentation Transcript

Causal inferences

Determining Financial Statement Effects of Various Transactions and Identifying Cash Flow Effects

Differential Slicing: Identifying Causal Execution Differences for Security Applications

Place-based Policy and Identifying Spatially Heterogeneous Effects

Benchmarking Methods for Identifying Causal Mutations

Causal Inference

Causal Reasoning

Identifying Causal Genes and Dysregulated Pathways in Complex Diseases

Causal inference

Causal Reasoning

Effects of GD in Canada - Review

Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Causal Diagrams and the Identification of Causal Effects

Identifying social effects from policy experiments

Estimating Causal Effects with Experimental Data

Causal Effects in Integrative Genomics

Causal Inference

Causal Diagrams and the Identification of Causal Effects

Causal Inference