270 likes | 432 Views
Evaluation methods for measuring the impact of social protection programs. Joost de Laat , Menahem Prywes, Shafique Jamal The World Bank. Objectives:. Understand: Principles of the difference in the differences method of project evaluation and weaknesses of the method.
E N D
Evaluation methods for measuring the impact of social protection programs Joost de Laat, Menahem Prywes, Shafique Jamal The World Bank
Objectives: Understand: • Principles of the difference in the differences method of project evaluation and weaknesses of the method. • Principles of the randomized controlled trial (RCT) method. • Limits and weaknesses of the randomized controlled trial method. • Principles of the regression discontinuity design (RDD) method.
How donors evaluated development projects. • Donors often couldn’t evaluate development projects, and especially health projects, convincingly because: • No one bothered to collect baseline data, • No one tracked the treatment (beneficiary) group over time. • Sometimes donors collected this information and then measured changes in the treatment group over time. • However it remained unclear whether this performance was better or worse to the comparator (treatment) groups. • Sometimes, donors applied the difference in the differences methods. • This compares results from the treatment group to results from a control group. • But it’s often unclear whether the comparator group really is comparable to the treatment group. • Parliaments and donors increasingly demand credible evaluations!
Difference in differences methodology-1 Souce: Prashant Bharadwaj
Difference in differences methodology-2 Source: Prashant Bharadwaj
Difference in differences methodology-3 Source: Prashant Bharadway
Difference in the differences: simple Difference in the differences methodology: simple numerical example. Source: Prashant Bharadway
In contrast to the Difference in the Differences method, randomized controlled trials seek to make valid comparisons between outcomes for treatment and control groups. • Randomization establishes a control group that is statistically identical to the intervention group. This produces unbiased results. • Randomization reduces selection bias, for example • Undercoverage: some parts of the population are under-represented in the sample. • Self-selection: people who agree to participate in the trial have special characteristics, i.e. strong opinions on an issue. • Nonresponse: bias: participants who do not respond may have particular views or other characteristics.
Unit of Randomization • Choose according to type of program • Individual/Household • School/Health Clinic/catchment area • Block/Village/Community • Ward/District/Region As a rule of thumb, randomize at the smallest viable unit of implementation. • Keep in mind • Need “sufficiently large” number of units to detect minimum desired impact: Power. • Spillovers/contamination • Operational and survey costs
Example: Randomized Assignment • Mexico Progresa Conditional Cash Transfer program • Unit of randomization: Community • 506 communities in the evaluation sample • Randomized phase-in • 320 treatment communities (14446 households): • First transfers in April 1998. • 186 comparison communities (9630 households): • First transfers November 1999
Example: Randomized Assignment • T=0 • T=1 • 320 • Time • Treatment Communities • 186 • Comparison Communities • Comparison Period
How do we know we have good clones? In the absence of Progresa, treatment and comparisons should be identical Let’s compare their characteristics at baseline (T=0) Example: Randomized Assignment
Example: Balance at Baseline Note: If the effect is statistically significant at the 5% significance level, we label the estimated impact with 2 stars (**).
Example: Balance at Baseline Note: If the effect is statistically significant at the 5% significance level, we label the estimated impact with 2 stars (**).
Example: Randomized Assignment Note: If the effect is statistically significant at the 5% significance level, we label the estimated impact with 2 stars (**).
Keep in Mind ! Randomized Assignment In Randomized Assignment, large enough samples, produces 2 statistically equivalent groups. Feasible for prospective evaluations with over-subscription/excess demand. Most pilots and new programs fall into this category. We have identified the perfect clone. Randomized comparison Randomized beneficiary
Limits on randomized controlled trials • Out of sample generalization: Results from these trials are internally valid but cannot be generalized (extrapolated) out of sample. An inference of general validity of a result would require an internally consistent theory of causation and repeated randomized controlled trials in different countries, demographic rules, and natural environments. • Results are comparisons of averages. Therefore the results of a randomized controlled trial may not be valid for making policies for sub-groups or for individual households and people –especially if the policymaker has additional information.
Risks of bias in the randomized controlled trial methodology • Self selection out of the control group. Randomized controlled trials in the social sciences are not double blind, like pharmaceuticals trials. The people who are not receiving the treatment (for example, tutoring, or nutritional supplements) may decide to obtain these on their own, biasing the results. • Replacement of drop-outs may lead to bias.
Limits to use of randomized controlled trials • Randomized controlled trials are expensive. They can cost any where from $150,000 to several million dollars. A $500,000 cost is typical. This means the method cannot be applied to all development projects. • Many development projects do not address units that can be randomized. For instance states or provinces/oblasts cannot be meaningfully randomized. • Ethical rules are unclear. In medical research, participation in a randomized controlled trial requires informed consent. There are no general rules for economic development project. US universities and some developing countries have ethical rules.
Subtle conflicts of interests and biases can prejudice all evaluation studies –whatever the methodology. • Sponsors’ conflict of interest. Donors, governments, project units, and NGOs prefer to report positive findings because this helps to sustain their business and jobs. Sometimes, project units resist or even refuse payment to contractors who deliver negative evaluation reports. • Contractors’ conflict of interest. The contractors who carry-out the evaluation studies may be influenced by their clients preferences. • Confirmation bias. Donors, governments, project units, NGOs often believe that outcomes are positive and tend to perceive positive outcomes. Also, officials, managers, and development experts come to identify personally with the projects. Their psychological bias is, ‘I mean well, therefore the project is successful.’ • Publication bias. Scholarly journals prefer to publish positive results and generally neglect negative results (‘no effect’ is not newsworthy). This may induce bias in academic work.
Economic and ethical questions • When should donors and governments insist on application of the randomized controlled trial methodology and when is this inappropriate? • When is it unethical to use the randomized controlled trials methodology in a development context?
Regression Discontinuity Design • Many social programs select beneficiaries using an index or score: Anti-poverty Programs Targeted to households below a given poverty index/income Targeted to population above a certain age Pensions Scholarships targeted to students with high scores on standarized text Education Fertilizer program targeted to small farms less than given number of hectares) Agriculture
Regression Discontinuity DesignExample: Effect of social assistance program on nutrition Goal Reduce vulnerability and improve nutrition of poor families Method • Households with a poverty score ≤50 are poor • Households with a poverty score >50 are not poor Intervention Poor households receive social assistance transfers
Regression Discontinuity Design-Baseline Not eligible Eligible
Regression Discontinuity Design • We have a continuous eligibility index with a defined cut-off • Households with a score ≤ cutoff are eligible • Households with a score > cutoff are not eligible • Or vice-versa • Intuitive explanation of the method: • Units just above the cut-off point are very similar to units just below it – good comparison. • Compare outcomes Y for units just above and below the cut-off point. For a discontinuity design, you need: Continuous eligibility index Clearly defines eligibility cut-off.
THANK YOU! Questions? Next: Tajikistan Example