600 likes | 775 Views
ENHANCING POLICY EFFECTIVENESS: THE ROLE OF IMPACT EVALUATION. UNDERSTANDING IMPACT EVALUATION. æ高政ç–效率:影å“评估的作用. 了解影å“评估. Lecture 1 演讲 1 Steps in impact evaluation design å½±å“评估的设计æ¥éª¤. What is impact? 什么是影å“.
E N D
ENHANCING POLICY EFFECTIVENESS: THE ROLE OF IMPACT EVALUATION UNDERSTANDING IMPACT EVALUATION 提高政策效率:影响评估的作用 了解影响评估
Lecture 1 演讲1 Steps in impact evaluation design 影响评估的设计步骤
What is impact?什么是影响 Impact = the outcome with the intervention compared to what it would have been in the absence of the intervention 影响 = 接受干预的主体与它在未接受干预的情况下在结果上的差别 At the heart of it is the idea of a attribution – and attribution implies a counterfactual 它的核心是归因问题—— 而归因问题需要借助反事实分析
When to do an impact evaluation 什么时候进行影响评估 Pilot programs 试点项目 Innovative programs 创新项目 Representative or important programs 代表性项目或重点项目
Steps in evaluation design 评估设计步骤 Clearly state objectives and outcomes 明确列出目标与预期结果 Map out the causal chain 制定出因果关系链 Identify evaluation questions 确定评估问题 Identify best available method for different evaluation questions 针对不同的评估问题选择最好的评估方法
The principles原理 Map out the causal chain (programme theory): see figure from BINP example 制定出因果关系图(项目理论): 参见孟加拉国综合营养计划(BINP)示例中的图 Understand context: Bangladesh is not Tamil Nadu 弄清项目环境: 孟加拉语不是泰米尔语 Anticipate heterogeneity: more malnourished children; different implementing agencies 预期的异质性: 更加营养不良的儿童; 不同的执行主体 Rigorous evaluation of impact using an appropriate counterfactual: PSM versus simple comparison 运用恰当的反事实进行严格的影响评估:倾向得分匹配 (PSM) 与简单对比 Rigorous factual analysis: targeting, KP gap, CNPs 严格的实例分析: 瞄准, KP缺口, CNPs Use mixed methods: informed by anthropology, focus groups, own field visits 综合使用各类方法: 包括人类学、焦点小组访谈和实地考察等
Program design (theory of change) mothers counselling change behavior in child nutrition C1 C2 B1 B2 D1 D2 D3 A E targeting participants improved nutrition outcome no leakage / substitute children supplemental feeding sufficient qnty / qlty
Common questions in TBIE 基于理论的影响评估(TBIE)中普遍使用的问题
Common questions in TBIE 基于理论的影响评估(TBIE)中普遍使用的问题
Exercise练习 Identify the objective(s) for your intervention and main outcome(s) (select measurable indicators) 确定干预目的和主要成果 (选择可测量的指标) Map out theory of change underlying your project 描绘出你的项目中的变化理论 What are the related evaluation questions? 相关的评估问题有哪些?
Lecture 2 演讲 2 Counterfactuals and comparison groups 反事实分析和对照组
The attribution problem归因问题:factual and counterfactual事实与反事实 有项目 项目影响 没有项目 Impact varies over time 影响随着时间推移发生变化
What has been the impact of the French revolution? 法国革命的影响是什么? “It is too early to say” Zhou Enlai “现在评价还言之过早” 周恩来
So where does the counterfactual come from? • 那么事实是从哪里来的呢? • Most usual is to use a comparison group of similar people / households / schools / firms… • 最常见的做法是选择相似人群、住户、学校或者公司作为对照组
What do we need to measure impact? Girl’s secondary enrolment 我们需要什么信息来衡量影响呢? 以女童初中入学为例 The majority of evaluations have just this information … which means we can say absolutely nothing about impact 大多数评估只有这些信息… 这意味着我们根本不能评估影响
Before versus after single difference comparisonBefore versus after = 92 – 40 = 52项目组前后期的比较干预前后的单差分= 92 – 40 = 52 “scholarships have led to rising schooling of young girls in the project villages” “在实施奖学金项目的村庄,女童的就学率升高” This ‘before versus after’ approach is outcome monitoring, which has become popular recently. Outcome monitoring has its place, but it is not impact evaluation 这种干预前后对照法是为了成果监测,近年来被广泛采用。这种成果监测法虽然不可或缺,但它不是影响评估。
Rates of completion of elementary male and female students in all rural China’s poor areas 1993年与2008年中国农村贫困地区完成小学学业的男女比例 Share of rural children占农村儿童的份额 1993 1993 2008 2008 男生 女生
Post-treatment control comparisonSingle difference = 92 – 84 = 8处理组与控制组事后数据的单差分= 92 – 84 = 8 But we don’t know if they were similar before… though there are ways of doing this (statistical matching = quasi-experimental approaches) 但是我们并不知道他们在基期的情况是否相似… 虽然可以采用很多方法 (统计匹配 = 准实验方法)
Double difference =(92-40)-(84-26) = 52-58 = -6双差分 =(92-40)-(84-26) = 52-58 = -6 Conclusion: Longitudinal (panel) data, with a comparison group, allow for the strongest impact evaluation design (though still need matching). SO WE NEED BASELINE DATA FROM PROJECT AND COMPARISON AREAS 结论: 纵向(面板)数据加上对照组可以更好地进行影响评估设计. 所以我们需要项目组和对照组的基线数据。
Main points so far主要观点 • Analysis of impact implies a counterfactual comparison • 影响分析意味着反事实分析 • Outcome monitoring is a factual analysis, and so cannot tell us about impact • 成果监测是事实分析,并不是影响评估 • The counterfactual is most commonly determined by using a comparison group • 反事实分析通常需要选择对照组 If you are going to do impact evaluation you need a credible counterfactual using a control group - VERY PREFERABLY WITH BASELINE DATA 在影响评估中,你需要对照组做可信的反事实分析,这就要求对照组最好也有基线数据
Exercise练习 • Using hypothetical outcome data for the before/after, comparison/treatment matrix calculate the • 运用假定的对照组和处理组在干预前后的数据矩阵进行以下计算 • Ex-post single difference • 处理组与对照组在干预后的比较 (单差分) • Before versus after (single difference) • 干预前与干预之后的比较 (单差分) • Double difference • 双差分 Impact estimates影响估算 Then it should probably be tea break around now
Problems in implementing rigorous impact evaluation: selecting a comparison group实施严谨的影响评估面临的难题:选择对照组 Contagion: other interventions 污染: 其它干预 Spill over effects: control affected by intervention 溢出效应: 对照组(控制组)会受到干预措施的影响 Selection bias: beneficiaries are different 选择偏误: 受益者不同 Ethical and political considerations 出于民族与政治上的考虑
Lecture 3 演讲3 Selection bias and experimental designs (randomized control trials, RCTs) 选择偏误和试验设计 (随机控制试验, RCTs)
The problem of selection bias选择偏误问题 • Program participants are not chosen at random, but selected through • 项目参与者不是随机挑选的,而是 • Program placement 项目安排 • Self selection 自选择 • This is a problem if the correlates of selection are also correlated with the outcomes of interest, since those participating would do better (or worse) than others regardless of the intervention • 如果影响选择的因素与我们感兴趣的结果变量相关,那么变为出现选择偏误。因为不论是否有这项干预,那些参与项目的人可能都会比其他人做得更好(或者更不好)
Selection bias from program placement项目安排导致的选择偏误 A program of school improvements is targeted at the poorest schools 改善学校项目的目标群体是最为贫困的学校 Since these schools are in poorer areas it is likely that students have home and parental characteristics are associated with lower learning outcomes (e.g. illiteracy, no electricity, child labour) 由于这些学校都处于更为贫困的地区,这些学生的家庭特征和父母特征都与更差的学习结果相关 (例如:文盲、不通电、童工等) Hence learning outcomes in project schools will be lower than the average for other schools 因此这些项目学校的学业成绩要低于其他学校的平均水平 The comparison group has to be drawn from a group of schools in similarly deprived areas 对照组就必须在类似的贫困地区的学校中选择
Selection bias from self-selection由自选择导致的样本选择偏误 A community fund is available for community-identified projects 社区基金往往只为社区范围内的项目提供经费 An intended outcome is to build social capital for future community development activities 该项目旨在为今后的社会发展活动建设社会资本 But those communities with higher degrees of cohesion and social organization (i.e. social capital) are more likely to be able to make proposals for financing 但那些凝聚力强、社会组织完善的社区更可能提交项目建议书寻求资助 Hence social capital is higher amongst beneficiary communities than non-beneficiaries regardless of the intervention, so a comparison between these two groups will overstate program impact 因此无论是否有社区基金的干预,那些受益的社区都比那些没有受益的社区拥有更好的社会资本,所以两者之间的对比结果高估了项目的影响
Examples of selection bias自选择的例子 Hospital delivery in Bangladesh (0.115 vs 0.067) 孟加拉的住院分娩案例(0.115 vs 0.067) Secondary education and teenage pregnancy in Zambia赞比亚的初中教育与青少年怀孕的案例 Male circumcision and HIV/AIDS in Africa 非洲男性包皮环切与艾滋病
Main point主要观点 There is ‘selection’ in who benefits from nearly all interventions. So need to get a control group which has the same characteristics as those selected for the intervention. 如果选择那些几乎能从所有干预中受益的人群接受干预,那么需要找到一个与这些被选择进来接受干预的人群特征相同的控制组.
Dealing with selection bias如何处理选择偏误 • Need to use experimental or quasi-experimental methods to cope with this; this is what has been meant by rigorous impact evaluation • 需要用试验法或准试验法处理这个偏误; 这就是严格影响评估所要求的 • Experimental (randomized control trials = RCTs, commonly used in agricultural research and medical trials, but are more widely applicable) • 试验法 (随机控制试验 = RCTs, 通常被应用于农业研究和医学实验,但其适用范围可以更广) • Quasi-experimental • 准试验方法 • Propensity score matching倾向得分匹配法 • Regression discontinuity断点回归法 • Pipeline approach流水线方法 • Regressions (including instrumental variables)回归法 (包含工具变量)
Randomization (RCTs)随机控制试验 (RCTs) • Randomization addresses the problem of selection bias by the random allocation of the treatment • 随机选择通过随机分配处理解决了选择偏误的问题 • Randomization may not be at the same level as the unit of intervention • 随机选择可能并不是在干预对象的同一水平进行 • Randomize across schools but measure individual learning outcomes • 随机选择学校,但是度量的是个人的学习成绩 • Randomize across sub-districts but measure village-level outcomes • 随机选择地区,但度量的是村级水平的结果变量
Randomization (RCTs)随机控制试验 (RCTs) • The less units over which you randomize the higher your standard errors • 随机选择的对象越少(对象层级越高),你的标准误就越大 • But you need to randomize across a ‘reasonable number’ of units • 但你需要对“合理数量”的单位进行随机选择 • At least 30 for simple randomized design (though possible imbalance considered a problem for n < 200) • 简单随机设计至少需要30个样本 (尽管认为n < 200时可能有失衡的问题) • Can be as few as 10 for matched pair randomization • 对于配对的随机试验样本可以少到10个
Issues in randomization随机选择中的问题 Randomize across eligible population not whole population 随机选择的对象是符合要求的群体,而不是整个总体 Can randomize across the pipeline 随机选择可以贯穿整个选择过程 Is no less unethical than any other method with a control group (perhaps more ethical), and any intervention which is not immediately universal in coverage has an untreated population to act as a potential control group 随机控制试验并不会比其他使用控制组的方法少面临一些道德问题 (或许还要更多), 任何不打算立即全面实施的干预都需要一个没有接受干预的总体作为控制组 No more costly than other survey-based approaches 随机控制试验不会比其他以调查为基础的方法花费更多
Conducting an RCT进行随机控制试验 Has to be an ex-ante design 必须有一个事前的设计 Has to be politically feasible, and confidence that program managers will maintain integrity of the design 必须政治上可行,相信项目经理将会维护项目设计的完整性 Perform power calculation to determine sample size (and therefore cost) 通过评估效力计算确定样本规模(继而确定相应的成本)
Conducting an RCT进行随机控制试验 • Adopt strict randomization protocol • 采用严格的随机抽样程序 • Maintain information on how randomization done, refusals and ‘cross-overs’ • 及时收集关于随机抽样是如何进行的、干预的拒绝者,以及交叉干预(cross-overs)的信息 • A, B and A+B designs (factorial designs) • A, B 和 A+B 设计 (因素分析) • Collect baseline data to 收集基期数据进行: • Test quality of the match 检验匹配的质量 • Conduct difference in difference analysis 进行双差分分析
Exercise练习 Does your intervention suffer from selection bias? Why? 你的干预是否存在选择偏误? 为什么? Is randomization an option for your intervention? 你在干预过程中是采用随机试验法吗? At what level would you randomize? 你是在哪个层次上进行随机选择的?
Lecture 4 演讲 4 Quasi-experimental designs 准试验设计
Quasi-experimental approaches准试验方法 • Possible methods可能使用的方法 • Propensity score matching 倾向得分匹配法 • Regression discontinuity 断点回归法 • Advantage: can be done ex post, and when random assignment not possible • 优点:可以在事后进行,并在随机分配不可行时使用 • Disadvantage: cannot be assured of absence of selection bias • 缺点:无法保证不存在选择偏误
Propensity score matching倾向得分匹配法 Need someone with all the same age, education, religion etc. 需要年龄、教育和宗教信仰等特征一样的人 But, matching on a single number calculated as a weighted average of these characteristics gives the same result and matching individually on every characteristic – this is the basis of propensity score matching 但是,倾向得分匹配是基于一个单个的数字进行的,这个数字是对这些特征进行加权平均得到的。该方法与基于每个单个特征进行匹配得出的结果是一样的,后者是倾向得分匹配法的基础。 The weights are given by the ‘participation equation’, that is a probit equation of whether a person participates in the project or not 其中,加权的权重由“参与方程”给出,“参与方程”为一个人是否参与了这个项目的Probit模型
Propensity score matching倾向得分匹配法:what you need你需要什么 Can be based on ex post single difference, though double difference is better 尽管双差分更好,但是能够基于事后的单个差分进行 Need common survey for treatment and potential control, or survey with common sections for matching variables and outcomes 需要处理组和潜在的控制组或者调查有同样的调查模块,即包含共同的可以匹配的特征变量和结果变量
Propensity score matching:example of matching倾向性得分匹配法示例:water supply in Nepal尼泊尔水供应案例
Regression discontinuity: an example – agricultural input supply program断点回归示例 – 农业投入供应项目 纯收入 土地持有(公顷)
Naïve impact estimates “天真”的影响评估 Total = income(treatment) – income(control) = 9.6 总影响= 收入(处理组) – 收入(控制组) = 9.6 Agricultural h/h only = 7.7 农业h/h 仅为 = 7.7 But there is a clear link between net income and land holdings 但是纯收入和土地持有量之间有明显的联系 And it turns out that the program targeted those households with at least 1.5 ha of land (you can see this in graph) 结果是,项目是针对那些至少拥有1.5公顷土地的农户开展的 So selection bias is a real issue, as the treatment group would have been better off in absence of program, so single difference estimate is upward bias 所以,选择偏误现实存在,当处理组即便在没有项目的情况下处境也会改善时,单差分就会高估项目的影响
Regression discontinuity断点回归法 Where there is a ‘threshold allocation rule’ for program participation, then we can estimate impact by comparing outcomes for those just above and below the threshold (as these groups are very similar) 当某项项目在确定参与条件时设置了一个“门槛分配原则”,我们可以通过比较处于这个门槛里的和门槛外的两组(因为这两组非常相似)人群来评估项目的影响 We can do that by estimating a regression with a dummy for the threshold value (and possibly also a slope dummy) – see graph 我们可以根据这个门槛值定义一个二元变量,然后对这个二元变量进行回归 (可能也要有一个倾斜度虚变量) In our case the impact estimate is 4.5, which is much less than that from the naïve estimates (less than half) 在我们的例子中,估计的影响是4.5,远低于前面得出的估计值(几乎小一半)
Discussion讨论 • What possible quasi-experimental designs could you use for your intervention? • 你会在你的干预计划中采取什么样的准试验设计? • What is “better” … RCT or quasiexperiment? • Easier • More accurate • More feasible
Lecture 5 演讲 5 Data collection and mixed methods 数据收集和混合法的运用
Overview on data collection数据收集总结 • Baseline, midterm and endline • 基线、中期和后期 • Treatment and comparison • 干预组和对照组 • Process data • 数据加工 • Capture contagion and spillovers • 捕捉样本污染与溢出效应 • Quant and qual • 定量与定性 • Different levels (e.g. facility data, worker data) – link the data • 不同层次 (例如. 机构数据,工人数据) – 链接这些数据 • Multiple data sources • 多个数据来源
Program design (theory of change) mothers counselling change behavior in child nutrition C1 C2 B1 B2 D1 D2 D3 A E targeting participants improved nutrition outcome no leakage / substitute children supplemental feeding sufficient qnty / qlty