1.49k likes | 1.68k Views
Evaluation Notes Bergen COURSE Spring, 2010. Petra Todd University of Pennsylvania Department of Economics. The Evaluation Problem. Will study econometric methods for evaluating effects of active labor market programs Employment, training and job search assistance programs
E N D
Evaluation NotesBergen COURSESpring, 2010 Petra Todd University of Pennsylvania Department of Economics
The Evaluation Problem • Will study econometric methods for evaluating effects of active labor market programs • Employment, training and job search assistance programs • School subsidy programs • Health interventions
Key questions • Do program participants benefit from the program? • Do program benefits exceed costs? • What is the social return to the program? • Would an alternative program yield greater impact at the same cost?
Goals • Understand the identifying assumptions needed to justify application of different estimators • Statistical assumptions • Behavioral assumptions • Assumptions with regard to heterogeneity in how people respond to a program intervention
Potential Outcomes • Y0 – outcome without treatment • Y1 – output with treatment • D=1 if receive treatment, else D=0 • Observed outcome Y=D Y1+(1-D) Y0 • Treatment Effect Δ= Y1-Y0 • Δ not directly observed, missing data problem
Parameters of Interest • Average impact of treatment on the treated (TT) E(Y1-Y0|D=1,X) • Average treatment effect (ATE) E(Y1-Y0|X) • Average effect of treatment on the untreated (UT) E(Y1-Y0|D=10,X) • ATE=Pr(D=1|X)TT+(1-Pr(D=1|X))UT
Other parameters of interest • Proportion of people benefiting from the program Pr(Y1>Y0|D=1)=Pr(Δ>0|D=1) • Distribution of treatment effects F(Δ|D=1,X) • Selected quantile Inf {Δ:F(Δ|D=1,X)>q}
Model for potential outcomes with and without treatment • Model: Y1=Xβ1+U1 Y0=Xβ0+U0 E(U1|X)=E(U0|X)=0 • Observed outcome: Y=Y0+E(Y1-Y0) Y= Xβ0+D(Xβ1- Xβ0)+U0+D(U1-U0)
Distinction between TT and ATE • TT=E(Δ|D=1,X)=Xβ1- Xβ0+E(U1-U0|D=1,X) • ATE= E(Δ|X)=Xβ1- Xβ0 • TT depends on structural parameters as well as means of unobservables • Parameters are the same if • (A1) U1=U0 • (A2) E(U1-U0|D=1,X)=0 • Condition (A2) means that D is uninformative on U1-U0, , i.e. ex post heterogeneity but not acted on ex ante
Three Commonly Made Assumptions from least to most general • Coefficient on D is fixed (given X) and is the same for everyone (most restrictive) • U1=U0 • Y=Xβ+Dα(X)+U • E(Y1-Y0|X,D)= α(X)
Coefficient on D is random given X, but U1-U0 does not help predict participation in the program Pr(D=1| U1-U0 ,X)=Pr(D=1|X) which implies E(U1-U0 |D=1,X)= E(U1-U0 |X) • Coefficient on D is random given X and D helps predict program participation (least restrictive) E(U1-U0 |D=1,X)≠E(U1-U0 |X)
How Can Randomization Solve the Evaluation Problem? • Comparison group selected using a randomization devise to randomly exclude some fraction of program applicants from the program • Main advantage – increase comparability between program participants and nonpartcipants • Have same distribution of observables and of unobservables • Satisfy program eligibility criteria
What problems can arise in social experiments? • Randomization bias – occurs when introducing randomization changes the way the program operates • Greater recruitment needs may lead to change in acceptance standards • Individuals may decide not to apply if they know they will be subject to randomization
Contamination bias – occurs when control group members seek alternative forms of treatment • Ethical considerations – there may be opposition to the experiment and some sites may refuse to participate, which poses a threat to external validity • Dropout – some of the treatment group members may drop out before completing the program • Sample attrition – may have differential attrition between the treatment and control groups
At what stage should randomization be applied? • Randomization after acceptance into the program • Randomization of eligibility • Let R=1 if randomized (treatment group), • R=0 if randomized out (control group) • Let Y1* and Y0* denote outcomes • Let D* denote someone who applies to the program and is subject to randomization
From treatment group, get E(Y1*|X,D*=1,R=1) • From control group, get E(Y0*|X,D*=1,R=0) • No randomization bias and random assignment implies E(Y1*|X,D*=1,R=1)=E(Y1|X,D=1) E(Y0*|X,D*=1,R=0)=E(Y0|X,D=1) • Thus, the experiment gives TT=E(Y1-Y0|X,D=1)
How does program dropout affect experiments? • Can define treatment as “intent-to-treat” or “offer of treatment,” in which case dropout not a problem • If dropout occurs prior to receiving the program (i.e. dropouts do not get treatment), then could treat it like randomization on eligibility.
Randomization on eligibility • Let e=1 if eligible, e=0 if not eligible • Let D=1 denote would-be participants if program were made available E(Y|X,e=1)=Pr(D=1|X,e=1)E(Y1|X,e=1,D=1) + Pr(D=0|X,e=1)E(Y0|X,e=1,D=0) E(Y|X,e=0)=Pr(D=1|X,e=0)E(Y0|X,e=0,D=1) + Pr(D=0|X,e=0)E(Y0|X,e=0,D=0)
Because eligibility is randomized, Pr(D=1|X,e=1)=Pr(D=1|X,e=0) Pr(D=0|X,e=1)=Pr(D=0|X,e=0) E(Y0|X,e,D=1)= E(Y0|X,D=1) E(Y1|X,e,D=1)= E(Y1|X,D=1) • Thus, difference in previous two equations gives Pr(D=1|X,e=1){E(Y1|X,e,D=1)-E(Y0|X,D=1)}
What about control group contamination? • Not necessarily a problem if willing to define benchmark state as being excluded from the program
What about sample attrition? • Attrition is a problem that is common to both experimental and nonexperimental studies • Attrition occurs when some people are not followed in the data (maybe due to nonresponse) • If attrition is nonrandom with respect to treatment, then attrition requires the use of nonexperimental evaluation methods
Traditional (Simple) Regression Estimators • Cross-section • Before-after • Difference-in-differences
“Ashenfelter’s Dip” Mean Y D=1 D=0 T=0
Drawbacks and Advantages of before-after approach • Drawbacks • Identification breaks down in the presence of time-specific intercepts • Can be sensitive to choice of time periods because of Ashenfelter Dip pattern • Advantage • minimal data requirements - only requires data on participants.
Advantages • Allows for time-specific intercepts that are common across groups • Consistent under fixed effect error structure – therefore allows for time-invariant unobservables to affect participation decisions and program outcomes
Matching Estimators • Assume have access to data on treated and untreated individuals (D=1 and D=0) • Assume also have access to a set of X variables whose distribution is not affected by D F(X|D,YP)=f(X|YP) where YP=(Y0,Y1) “potential outcomes”
Matching estimators pair treated individuals with observably similar untreated individuals • Usually assumed that (Y0,Y1) ╨ D | X (M-1) or Pr(D=1|X, Y0,Y1) = Pr(D=1|X) and 0<Pr(D=1|X)<1 (M-2) • To justify this assumption, individuals cannot select into the program based on anticipated treatment impact
Assumption (M-1) implies F(Y0|D=1,X)=F(Y0|D=0,X)=F(Y0|X) F(Y1|D=1,X)=F(Y1|D=0,X)=F(Y1|X) also E(Y0|D=1,X)=E(Y0|D=0,X)=E(Y0|X) E(Y1|D=1,X)=E(Y1|D=0,X)=E(Y1|X) • Under assumptions that justify matching, can estimate TT, ATE, and UT
Let n denote number of observations in the treatment group • A typical matching estimator for TT takes the form:
is an estimator for the matched no treatment outcome Recall, that (M-1) implies
How does matching compare to a randomized experiment? • Distribution of observables will by construction be the same matched control group as in the treatment group • However, distribution of unobservables not necessarily balanced across groups • Experiment has full support (M-2), but with matching there can be a failure of the common support condition (when matches cannot be found)
Even though matching methods assume E(Y1-Y0|D=1,X)=E(Y1-Y0|X) Could still potentially have E(Y1-Y0|D=1)≠E(Y1-Y0) E(Δ|D=1)=∫E(Δ|D=1,X)f(X|D=1)dX E(Δ)=∫E(Δ|X)f(X)dX
If interest centers on TT, (M-1) can be replaced by weaker assumption E(Y0|X,D=1)=E(Y0|X,D=0)=E(Y0|X) • The weaker assumption allows selection into the program to depend on Y1 and allows E(Y1-Y0|X,D)≠E(Y1-Y0|X) • Only require Pr(D=1|X,Y0,Y1)=Pr(D=1|X,Y1)
Practical problems in Matching • Problems • How to construct match when X is of high dimension • How to choose set of X values • What do to if Pr(D=1|X)=1 for some X (violation of common support condition (M-1))
Rosenbaum and Rubin (1983) Theorem • Provide a solution to the problem of constructing a match when X is of high dimension • Show that (Y0,Y1) ╨ D | X Implies (Y0,Y1) ╨ D | Pr(D=1|X) • Reduces the matching problem to a univariate problem, provided Pr(D=1|X) can be parametrically estimated • Pr(D=1|X) is known as the propensity score
Proof of RR theorem • Let P(X)=Pr(D=1|X) • E(D|Y0,P(X))=E(E(D|Y0,X)|Y0,P(X)) = E(P(X)|Y0,P(X)) =P(X) Where first equality holds because X is finer than P(X) • E(D|Y0,X)=E(D|X)=P(X)
Matching can be implemented in two steps • Step 1: estimate a model for program participation, estimate the propensity score P(Xi) for each person • Step 2: Select matches based on the estimated propensity score
Ways of constructing matched outcomes • Define a neighborhood C(Pi) for each person i Є{Di=1} • Neighbors are persons in {Dj=0} for whom PjЄ C(Pi) • Set of persons matched to i is Ai={jЄ{Di=0} such that PjЄ C(Pi)}
Nearest Neighbor Matching • C(Pi)=min || Pi-Pj || j jЄ{Di=0} => Ai is a singleton set • Caliper matching Matches only made if || Pi-Pj ||<ε for some prespecified tolerance (tries to avoid bad matches)
Kernel Matching • Estimate matched outcomes by nonparametric regression