Programme Evaluation for Policy Analysis

Programme Evaluation for Policy Analysis Mike Brewer, 4 October 2011 www.pepa.ac.uk

Outline • Who we are • Overview and aims • The 5 projects • Training and capacity building

Who we are: PI and co-Is • Richard Blundell, UCL & IFS • Mike Brewer, University of Essex & IFS • Andrew Chesher, UCL & IFS • Monica Costa Dias, IFS • Thomas Crossley, Cambridge & IFS • Lorraine Dearden, IoE & IFS • Hamish Low, Cambridge & IFS • Costas Meghir, Yale & IFS • ImranRasul, UCL & IFS • Adam Rosen, UCL • Barbara Sianesi, IFS • DWP is a “partner”

Programme Evaluation for Policy Analysis: overview • PEPA is about ways to do, and ways to get the most out of, “programme evaluation” “estimating the casual impact of” “government policies” (although can often generalise)

Programme Evaluation for Policy Analysis: overview • PEPA is about ways to do, and ways to get the most out of, “programme evaluation” • Aims • To stimulate a step change in the conduct of programme evaluation in the United Kingdom (and around the world) • To maximise the value of programme evaluation by improving the design of evaluations, and improving the way that such evaluations add to the knowledge base • Beneficiaries • those who do programme evaluation • those who commission, design and make decisions based on the results of evaluations • those interested in impact of labour market, education and health policies

More on our aims: three challenges for programme evaluation • We know the outcomes for participants on a training programme. But what was the counterfactual? • Given the counter-factual, we can estimate the programme’s impact. But how certain are we? • Given that the evaluation has been done, how can we get the most value from it? • How can we generalize what we learn from this evaluation to other training programs? • How should we synthesize the lessons learned from multiple studies of different training programs?

PEPA: overview

1. Making the most of RCTs: reassessing ERA(Sianesi & Lise) • The Employment, Retention and Advancement demonstration (2003-2007) • first large-scale RCT in social policy in UK (over 16,000 people) • has been evaluated experimentally (Hendraet al., 2011) • Aim: maximise the value of the ERA experiment • Improve the design of non-experimental evaluations • Improve way such evaluations add to the knowledge base • “Gold standard” randomisation is still rare • costly, impractical or politically infeasible → Project 1A • lack of external validity and ex ante analysis → Project 1B

1a. Lessons for non-experimental methods(Sianesi) • Non-experimental evaluation methods have been assessed against an experimental benchmark in a small number of US studies in the 1970s and 1980s • Exploit a recent and UK-based random experiment to learn about – and possibly improve upon – the performance of non-experimental methods routinely used in UK evaluations • pilot-control areas • individual matching • difference-in-differences • The experimental estimates will be compared against the best alternative that can be devised with the available data

1b. A reassessment of the ERA(Lise) • Can experimental data be combined with behavioural models of labour market behaviour to lead to better ex ante evaluations? • Methodology • take a typical search and matching model, and calibrate it to match the data on ERA comparison group • simulate ERA policy within the model • check if simulated outcomes match observed data for ERA participants • Experimental variation allows testing of theoretical model • If simulated outcomes match ERA participants’ outcomes, then: • can use simulations to evaluate ex ante alternative ERA policies • can see how estimate of policy impact changes once interactions with wider labour market are taken into account

2. Improving inference for policy evaluation(Crossley, Brewer, Hernandez, Ham) • Critical to characterise uncertainty of estimates (and thus perform inference correctly) • This can be hard when • data have a multi-level structure, and where there is serial correlation in the treatment and in group-level shocks • when the estimated policy impacts are complex and discontinuous functions of estimated parameters • Similarly, can be hard to perform power calculations in all but simplest RCT • Aims • Review, disseminate and (hopefully) develop techniques • Provide resources • Substantive applications: impact of labour market or welfare-to-work programmes

2a. Inference and power in Diff-in-Diff (Crossley, Brewer, Hernandez) • A common evaluation technique is to use diff-in-diff over areas and time • Serially-correlated errors and group-level structure of data mean naïve inference often incorrect (standard errors “too small”; Bertrand et al. 2004) • But most solutions work only for “large” number of groups, and literature evolving much faster than practice • Aims • Demonstrate the problems for inference caused by serially-correlated and multi-level data, and the practicality and relevance of a range of suggested solutions, providing resources where appropriate • Develop new tools for inference • randomisation/permutation tests • serial correlation in the non-linear DiD

2a. Inference and power in Diff-in-Diff (Crossley, Brewer, Hernandez) • Flip side to inference is a power calculation • Will produce resources to carry out power calculations for non-experimental designs. • difference-in-differences • instrumental variables • regression discontinuity • Power calculations will reflect: • Cluster effects: observations from different agents are not independent from each other • Monte Carlo methods to deal with a reduced number of clusters • Different patterns of time-series correlation

2b. Inference in duration analysis(Brewer, Ham) • Duration/survivor or transition models are natural tools for programme evaluation when outcomes of interest are spells or transitions • Estimated policy impacts often complex, discontinuous functions of the estimated parameters of a statistical model • Will establish how best to use event history models to provide policy-makers with • estimates of the impact of a policy on the hazard rate • expected time spent in various states • correct confidence intervals around these both • Will build on Eberwein, Ham and Lalonde (2002), Ham and Woutersen (2009) and Ham, Li and Sheppard (2010)

3. Control functions in policy evaluation(Blundell, Costa Dias, Rosen, Chesher, Kitagawa) • Choice among alternative evaluation methods is driven by three concerns • Question to be answered • Type and quality of data available • Assignment rule (the mechanism that allocates individuals to the programme) • This project focuses on the last • Idea • The ideal assignment rule comes from an RCT • But if we know something about the assignment rule, then the control function approach allows us to account for/correct for the endogenous selection into treatment

3. The control function approach: example • Interested in the impact of university education on subsequent labour market earnings (the “returns to university education”) • Unobservable determinants of earnings, e.g. underlying ability, will be correlated with the decision to attend university, so a simple regression will provide a biased view of the returns to university • By modelling key features of the decision to attend university – the “assignment rule” to university – the control function approach can correctly recover the average return to university among those who took up a place

3. The control function approach: example (continued) • These key features will ideally be factors that determine assignment to university but do not determine directly final earnings in the labour market • Family socio-economic background, level of university fees, distance to university, availability of university places (if rationed) • If can write down an equation modelling the way these factors determine university attendance, we can construct an index (or ‘control function’) that can then be included in the earnings regression along with the indicator for attending university. • Extension of the ‘Heckman’ selection approach that controls for the endogenous selection into treatment

3. The control function approach: our research • Research questions: • Under what circumstances does the use of a control function compare favourably to matching and instrumental variables? What are the key trade-offs? • How does a control function approach map into a behavioural model? What can a control function approach tell us about structural parameters of interest? • Can we weaken the control function approach by incorporating partial knowledge of the assignment rule to produce bounds? • Will study various education and labour market policies

4. Dynamic behavioural models for policy evaluation (Low, Dias, Shaw, Meghir, Pistaferri) • Classical ex post empirical evaluation methods often fail to explainthe nature of the estimated effect • Cannot disentangle impact of programme on incentives from how incentives affect individual decisions • Cannot account for dynamic responses (anticipation or changes now affect decisions in future) • Studies often rely on different sets of behavioural assumptions • Difficult to understand, as not explicitly stated • Complicates task of synthetising information from different studies • Cannot be used for counterfactual analysis • Results are specific to the policy, time and environment

4. Dynamic behavioural models for policy evaluation • Aim: to address these weaknesses using a structural (dynamic behavioural) approach • Explicitly formalises incentives and decisions • But relies on heavy set of (explicit) behavioural assumptions • Will study ways to make minimal and transparent assumptions • Use quasi-experimental data to estimate and validate models of behaviour • Explore the use of optimality conditions - independent of the full structure of the model - to estimate some parameters • Use robust estimates of bounds on treatment effects to bound structural parameters

4. Dynamic behavioural models for policy evaluation: applications • Impact of welfare time-limits • Develop dynamic model to study how time-limits in welfare eligibility may affect claiming decisions at different stages of life • Use the US programme, “Targeted Help to Needy Families”, as the empirical application • Our model will replicate, and then generalise, previous empirical results • Impact of welfare-to-work on education • Use structural behavioural model of education and labour supply choices to evaluate how future welfare-to-work programmes affects the ex ante value of education • Use evaluation studies to validate the behavioural assumptions • Use partial identification to provide bounds for structural parameters

5. Social networks and program evaluation(Rasul, Fitzsimons, Hernandez, Malde) • To understand individuals’ or households’s behaviour, must recognize that individuals are embedded within social networks • In developing countries, networks play various roles: • substitute for missing markets • key source of insurance and other resources to their members • Will seek to understand how networks interplay with policy interventions • Will combine developments in theories of network formation and behavior within networks with empirical methods for program evaluation with social interactions

5. Social networks and program evaluation: example of Progresa • Progresa is village-level intervention in rural Mexico. Previous research has shown that: • 1 in 5 households are “isolated” (none of their extended family resides within the same village) • On some margins, only non-isolated households responded to Progresa • Was it because poor families needed assistance and encouragement to join the programme? • Or was it because of nature of Progresa intervention, part of which was to encourage teenage girls to stay in school?

5. Social networks and program evaluation • Substantive research questions • How are the benefits of program interventions dissipated within communities once social networks are accounted for? • How do such spillovers (from beneficiary to non-beneficiary households) affect the cost-benefit analysis of programs, and how we think about targeting? • Why and how are social networks formed (can investigate this by studying particular interventions) • Methodological research questions • How best to measuring whether and how households are socially tied (blood ties , resource flows)?

PEPA: research questions

Training and capacity building • Mixture of courses, masterclasses, workshops and resources (how-to manuals, software) • All projects have their own TCB programme • Plus core TCB offering in general programme evaluation skills • 4 “standard” courses/year and 1 “advanced” course/year • 1 course/year for those designing or commissioning evaluations

PEPA: training and capacity building

PEPA management and administration team • Director • Now until October 2012: Mike Brewer • April 2012 thereafter: Lorraine Dearden • Co-director: Monica Costa Dias • Administrator: Kylie Groves • IT: Andrew Reynolds • DWP are partner organisation, with hope that this eases access to their data. In practice, very reliant on key contact (Mike Daly)

Programme Evaluation for Policy Analysis