Performance Management Systems and Evaluation: Towards a Mutually Reinforcing Relationship

Performance Management Systems and Evaluation:Towards a Mutually Reinforcing Relationship Jacob Alex Klerman (Abt Associates) APPAM/HSE Conference “Improving the Quality of Public Services”Moscow, June 2011

Performance Management Evaluation Performance Management Systems and Evaluation

Performance Management The need is clear “What gets measured gets done” If you know what you want done; you need to manage against it To manage against it, you need to measure it Evaluation Performance Management Systems and Evaluation

Performance Management The need is clear “What gets measured gets done” If you know what you want done; you need to manage against it To manage against it, you need to measure it Evaluation But what do you want done? Are you sure? That’s the role of rigorous impact evaluation Dirty little secret: much of what we do—much of what seems “plausible”—has minimal impact (or even hurts) Performance Management Systems and Evaluation

Rigorous Impact Evaluation Is Crucial • Everyone wants better program outcomes • We might even be willing to spend more if we could prove better outcomes • Proving “better outcomes” requires rigorous impact evaluation • Many apparently plausible programs (and program innovations) don’t work • Naive evaluation methods give the wrong answer • Rigorous impact evaluation is challenging • Requiring large samples • And, the smaller the projected incremental impact, the larger the required samples

Outline • Current Practice • A Better Way • Closing Thoughts

Asks the wrong questions: Does the program “work”? i.e., Should we shut the program down? Big programs address major social problems The programs aren’t going away Current Evaluation Practice Isn’t Very Useful

Asks the wrong questions: Does the program “work”? i.e., Should we shut the program down? Big programs address major social problems The programs aren’t going away The right question is often:How can we make the program better? Which program model works better? Would some minor—and affordable—change in program design help? For which subgroups does our program work? Target the program at them Current Evaluation Practice Isn’t Very Useful

The Realities of Sample Size and Cost • Answering up/down evaluation question requires (relatively) small samples • For a training program, perhaps 500-2,000 case • Answering practitioners’ questions requires much large samples • For a training program, perhaps 10,000+ cases • At current evaluation cost—$1,000+ per case—we can’t afford to answer practitioner’s questions • Especially if the change in outcomes will be at best small • And that’s a big problem because CQI/kaizen suggests that major improvement often comes from lots of small improvements

Answering up/down evaluation question requires (relatively) small samples For a training program, perhaps 500-2,000 case Answering practitioners’ questions requires much large samples For a training program, perhaps 10,000+ cases At current evaluation cost—$1,000+ per case—we can’t afford to answer practitioner’s questions Especially if the change in outcomes will be at best small And that’s a big problem because CQI/kaizen suggests that major improvement often comes from lots of small improvements The Realities of Sample Size and Cost To answer practitioner’s questions, we’re going to need to get the cost way down

Steps in a Current Evaluation • Negotiate access to sites, including convincing them to deny service to some applicants • <time consuming and expensive> • Detailed process analysis at each site • <expensive> • Detailed survey follow-up • <expensive for each case; and there are a lot of cases>

Negotiate access to sites, including convincing them to deny service to some applicants <time consuming and expensive> Detailed process analysis at each site <expensive> Detailed survey follow-up <expensive for each case; and there are a lot of cases> Steps in a Current Evaluation Is there another way? Sometimes, yes …

At the Back End—Leverage Ongoing Performance Management Systems • We have just argued that collecting information on outcomes drives costs • Performance measurement systems already collect information on outcomes • Presumably on the key outcomes • So, when we can measure outcomes through the performance measurement system • Costs will be much, much lower • Allowing large samples • A key requirement for evaluating incremental changes • Will only work when both treatment and control are “in the system” (e.g., incremental changes)

At the Front End—A Learning Organization • Currently research is “top down” • Someone outside the system decides to evaluate X • Then, evaluator tries to convince sites to adopt X;and to deny all services to a control group

Currently research is “top down” Someone outside the system decides to evaluate X Then, evaluator tries to convince sites to adopt X;and to deny all services to a control group Alternative is “bottom up” Ask sites to suggest policy/program changes Form a committee—site representatives, central program staff, substance experts, evaluation experts Ask them to make a short list from among the suggestions Ask sites to volunteer to implement the selected suggestions Randomize at the site level:control condition is “current practice”, not “no service”(no one is denied service) At the Front End—A Learning Organization Cutting time and costs

In Summary And when your costs drop sharply, CQI is now feasible; i.e., you can evaluate small changes

A True Learning Organization • Performance measurement is an ongoing task • CQI/Continuous Quality Improvement; i.e., • Proposing small changes to SOP/Standard Operating Procedures • Rigorously evaluating those small changes • Adopting those that can be shown to “help” • … Should also be an ongoing task • The key insight of “kaizen” is that improved outcomes arise from the accumulation of lots of such small changes Data already collected as part of Performance Management Systems makes such CQI feasible

When Will this Work? • Site level randomization needs lots (50-200) of, relatively similar, sites • Sites can be small (15-100 cases) • Central organization controls resources • Much easier to get volunteers, when volunteering is the only way to get more resources We’re looking for test cases. Any volunteers?

Performance Management Systems and Evaluation:Towards a Mutually Reinforcing Relationship Jacob Alex Klerman APPAM/HSE Conference “Improving the Quality of Public Services”Moscow, June 2011

Performance Management Systems and Evaluation: Towards a Mutually Reinforcing Relationship