430 likes | 444 Views
Join Prof. Lena Tsipouri at CoP-CIE, Riga to delve into constructing, testing policy theories. Explore terminology, causal relations, theory failure, realist evaluation, challenges in impact evaluation. Understand the basics of causal models, theory assumptions, program logic, and more.
E N D
Theory-based evaluationsArticulate and test a policy or a programme theory Community of Practice on Counterfactual Impact Evaluation (CoP-CIE) Riga (LV), 8-9 June 2017 Prof. Lena Tsipouri National and Kapodistrian University of Athens :
Outline • Some semantics • The Basics of TBE (how to open the Black Box) • What do we need to construct the theory (behind the evalution) • Formulating evaluation questions • Selecting Methodologies (and the Methodologies –Data loop) • Organise the results • Why we need a combination • The case of ALMPs’ evaluations
Some semantics of impact evaluations(terminology is a mess and can be confusing) • Impact evaluation (TB or otherwise) • Programme theory • Policy theory • Theory of change • Intervention logic • Outputs/outcomes; results/impacts • Participatory evaluation rejects evaluation against objectives set by outsiders in favour of local narratives and judgments of ‘success (WB) • Realist evaluation: mechanics + context
Some semantics of impact evaluations(terminology can be confusing) • Outcome oriented approaches usually examine correlations (black box) while TBE is trying to open the black box and unpack the causal relations • A good programme(policy/measure) theory describes the causal links between the programme(policy/measure) and its intermediate and ultimate outcomes and tests whether or not they are intended (what works for whom and under which circumstances). It is (expected to be) an explicit theory or set of assumptions • A causal modelis a condition sine qua non (there is no theory without it) • However, a causal model without appropriate quantitative analysis risks remaining descriptive statistics jumping to conclusions
The Basics: how to open the black box • The “mathematics” (theory/conceptual approach) mechanics • The institutional set up (delivery mechanism), e.g. effective project selection, time to contract; bureaucracy for payments (practice/empirical approach) context Realist evaluation: mechanics+context
The Basics A programme theory ideally consists of two components: • A theory of change (the central processes or drivers by which change comes about for individuals, groups, or communities) and • A theory of action (how programmes or other interventions are constructed to activate these theories of change). A TBE then allows us to systematically distinguish between • theory failure (wrong assumptions in the intervention logic) • implementation failure (inappropriate implementation)
The Basics • The main function of TBE is to shed light/clarify (potential) causalities • There is a danger when constructing the programme theory: theories can be taken for granted (implicitly or explicitly), although they are not generic at all • Theory means being careful what is an assumption and what is a hypothesis • When examined in different circumstances one may find recursive causality (i.e. feedback or reinforcing loops), disproportionate or ‘tipping point’ relationships and emergent outcomes (in theory quantitative evaluations can take care of that but how many variables can they test?)
The Basics (Distinguish assumptions from hypotheses!!) Unfortunately in SSH there are no theories/rules carved in stone (theories evolvebecause we get better data, better tools or changing conditions) A causal relationships that has been proven to exist by previous research is used as an assumption for policy design But this assumption may not be valid in the place and time of the intervention; in this case the intervention may (but may also not) fail. If it fails the assumption proves wrong, it turns into a rejected hypothesis. But sometimes even if the correlation proves correct, the assumption may still be wrong. Assumptions from different theories can be used to organisea programme theory and check which suits best to (the envisaged) evidence to be evaluated.
The Basics If the theory is not explicit (which is more often the case than one would hope) the evaluator has to elicit it (ex post): identify the assumptions and convert them into hypotheses (e.g. Create a bridge, assuming that it will reduce cost and time for access; did the use of the bridge reduce cost and time? Was the toll appropriate? Was maintenance sufficient? Were other conditions favourable? What were the unintended consequences e.g. tax foregone for small restaurants in the previous route?)
The Basics: theoretical problems Theory-based evaluations would be easy if the world would be less complex; but it is not because: • Many inputs may (or may not) contribute to the same impact • Some of them may be more important than others or even indispensable (leading to over-determination) • Impact heterogeneity can cause problems in attempts to summarize the meaning of the studies
The Basics: Problems in the real world • The problem with TBE is that (too often) theory is often implicit, simplified, subjective. We think we have a theory but we do not • But even if it is explicit and well documented it may have flaws • Politicians and stakeholders have clear ideas on what they want to do; these may be underpinned by theory (enlightened politicians/FDR’s New Deal) and may not (uneducated; populist; corrupt politicians) • Policy makers need to follow their political leaders’ guidelines. If they are unlucky their policies and programmes have no underlying intervention logic. They may have conventional wisdom (to be tested whether applicable in the local circumstances) • TBE can easily degenerate to simple descriptive statistics, when the methodologies used are detached from the theory (jumping to step 4 to gain time and hassle)
The linear model example The linear model assumption-hypothesis • Increasing basic research leads to higher competitiveness (extensively tested in the US and elsewhere) • ESIFs were used in the late ‘90s to increase basic research. Competitiveness did not increase (in fact the linear model in the case of LFRs was a hypothesis that had to be rejected; it was not an assumption based on OECD evidence)
Example from employment interventions Y=f(xi) (Y=immediate hiring is output or result; Y=long term employment is outcome or impact) Examples: Y=increase employment x1=supporting startups (underlying theory: assumptions that a pyramid of new companies will produce some rapid growth ones that will ensure employment; hypotheses: there is a market failure because the banking system does not lend companies with neither track record nor collaterals; the larger the number of companies supported the larger the bottom of the pyramid, the larger the top; alternative hypothesis: the higher the size of support the higher the likelihood of success) x2=supporting retraining Training improves employment and increases salaries • Not when it offers high training participation rewards – it attracts people with different rationales than employment (mechanics)
What do we need to construct our theory • Clear objectives (Intended Impacts should be determined by programme objectives; you cannot evaluate an intervention and examine whether it produced an impact it was not aiming at) • An intervention logic (theory of change) In the real world the building blocks are not as clear as expected
What do we need to construct out theory Objectives can be • Clear or not • Implicit or explicit • Simple or Multiple • Competing (e.g. increase employment and competitiveness; growth and environmental protection) • Complementary (to other interventions; e.g. export subsidies and infrastructure development; training programmes and wage subsidies)
Formulating evaluation questions The questions are determined by the objectives and the theoretical model E.g. For a training programme: • Did the intervention reduce short term unemployment? • Long term? • Did it increase competitiveness?
Techniques to elicit the theory • concept mapping, • logic modelling, • system mapping, • problem and solution trees, • scenario building, • Etc…….
Selecting the appropriate methodologies (identify the challenges first) • How clear are the purpose/objectives of the programmes (MAs still confuse the jargon); no clear intervention logic • How good is theory? Controversies in social sciences are inevitably reflected in the theory-basis (example: will the net-neutrality suggested change by President Trump improve prices and services in the US?) • How good are our tools? Can statistics and econometrics solve the attribution problem? Select the appropriate methodology (ies) • How good are our data? Can it be improved and at what cost?
Selecting methodologies There is no silver bullet for each type of evaluations (often MAs ask us: what should I use for intervention X; anyone who answers with only the name – and even the budget – of the intervention is not a serious evaluator) The emphasis is on “selecting”; there is an array of possibilities (but people tend to avoid making choices; with hindsight a choice may prove wrong! This does not mean it was not the best choice with the information available at the moment the choice was made) Good ToR and good evaluators pick the right ones: using all possible methodologies is too expensive and increasingly redundant; choosing “with whom to go and whom to leave” distinguishes good from bad evaluators When is the appropriate moment to select the method and by whom (before or after drafting the ToR? By the Ma or by the evaluator? What should be “strait jackets and how many degrees of freedom” call it rigidity/flexibility” or “determination/fuzziness”?)
Array of potential methodologies Quantitative(experimental and non-experimental) • Econometric models • Macro-economic models • Randomized control trials, RCTs); Counterfactual (Control groups = compare with peers or historic projections - compare with one’s self) • Cost effectiveness • Rate of return Qualitative may be needed if random assignments are not possible or to test hypotheses • interviews, • focus groups, • collection of • administrative data, • questionnaires/surveys
Selecting methodologies* There may be hints but no silver bullets for methodology selection • Large n Experimental and quasi-experimental methods • Small n mainly qualitative * White 2010
An endless loop between methodologies and data Data Methodologies
Data (available and retrievable) • Reliability • Relevance • Cost
Constructing indicators • (ex ante) – Kahneman danger • (ex post) – Testing theory and improving future monitoring
Organise the results After the results are compiled we need to check whether the results are compatible with the theory (Quantitative impact evaluations, plus asking participants and key informants) Confront (or juxtapose) different theories (example Carvalho & White, 2004, with regard to social funds). Empirically test the programme theory by making use of primary or secondary data (triangulation), both qualitative and quantitative. Organise an iterative process of continuous refinement using stakeholder feedback and multiple data collection techniques and sources (in the realist tradition) Delphi Make use of already published reviews and synthesis studies. These can play a pivotal role in marshalling existing evidence to deepen the power and validity of a TBE, to contribute to future knowledge building and to meet the information needs of stakeholders. Meta-evaluation Visualisation or mapping software can help in this task.
Organise the results of TBE • Corroborating existing theories • Complement / refine existing theories (conditions of success or failure, like thresholds, capacity utilisation; economic conditions, geographical limitations etc.) • Reject theories
Why we need a combination? The danger of “jumping to conclusions” • none of them works well alone • Synergies of TBE – Quant – Qual • Attribution = link observed changes to the intervention
Why we need both (approaches and skills) The quantitative impact evaluations have both strengths and weaknesses. • They are strong in measuring the impacts/outcomes of programme interventions and ascertaining causality between programme and outcome. They are also strong in measuring the average outcomes of interventions for large sub-groups of participants. • But they are weak (although some try to include a higher number of variables) in identifying why programmes work or do not work; whether they only work for some groups or in some locations and not in others; and how to make them work better.
Why we need a combination: Problems in pure quant • Check for diminishing returns • Check for thresholds • One or two way causation (best performing schools selected and having the best outcomes) • Statistical analysis is of stochastic relationships, meaning precisely that there are also some unknown elements (‘the error term’) which affect outcomes..
Why we need a combination: Problems in pure theory based • Misunderstand conventional wisdom for theory • Use descriptive statistics instead of coplex techniques offering more information
Venturing an unorthodox simplification Scandinavian countries have resilient competitive economies accompanied by inclusive societies (compared to other advanced economies) because they have more mature evaluation systems
Active Labour Market Policies * • There is still a lack of systematic evaluation in Europe. • The evaluation studies often arrive at contradictory results, making it difficult to combine them in a coherent way. • ALMPs are complex programmes that work under different contextual conditions, and this makes them difficult to evaluate *Breedgaard
Active Labour Market Policies Meta-analyses • The analysis, however, only provides very limited information on why and how the programmes worked, and it only summarises a narrow selection of the vast ALMP evaluation literature. • The quantitative methods are weak in identifying why programmes work or do not work; whether they only work for some groups or in some locations and not in others; and how to make them work better.
Active Labour Market Policies Methodologies used Randomised control trials (RCT)=comparing outcomes for a participant group with outcomes for a control group of non-participants. A specified period after the termination of a given programme, outcomes are measured as changes in employment, unemployment, income or wages. Impact evaluations then deduce, but do not necessarily demonstrate, that changes in outcomes are the causal effect of the programme. As a result, the content of programmes and their implementation tend to become a ‘black box’.
Active Labour Market Policies (Methodologies) Micro-econometric impact • Randomised control trials (RCT) • Econometric Impact Evaluations; micro-econometric impact evaluation • Quasi experiments • Matching techniques • Duration analysis • __________________ Aggregate impact analysis, with the aim of measuring the general effects of labour market policy on macroeconomic performance (such as aggregate employment, unemployment, and wages) • all of these methods can be improved by combining them with economic evaluation methods, such as cost-benefit analysis • Macro-economic evaluations estimate the correlation between indicators of labour market performance in selected countries (e.g. the employment and unemployment rate) and different explanatory variables or policies (e.g. duration and generosity of unemployment benefits systems, and expenditure on active labour market policies) (for a review see European Commission 2006: chapter 3). macroeconomic evidence relating to the impact of ALMPs on employment and unemployment rates. Th is approach typically relies on cross-country econometric analysis based on large panel data sets rather than individual programme evaluations.
Active Labour Market Policies (cont.) • There are countless qualitative case studies, implementation studies and process evaluation of experiences with ALMPs. They reveal plenty of information on how employment programmes were introduced, implemented and evolved, how they work, how participants experience programmes etc. But this type of qualitative and descriptive information cannot evaluate the causality of outcomes and impacts and findings are often difficult to generalise beyond their specific context. • This is problematic when we want to learn what type of interventions works for whom, under which circumstances.
Active Labour Market Policies (cont.) We need contextual and qualitative data / information to answer the what/how question • Most ALMP interventions are complex, which make them difficult to evaluate. • Often, the target groups face major barriers to (re)integration in the labour market; • Techniques for improving participants’ employability and employment opportunities are difficult to standardise; • the interventions do not operate in isolation but in combination with other policies and programmes(industrial and macroeconomic policies) and under different structural and economic conditions; • outcomes are difficult to measure in the short-term.
Active Labour Market Policies (cont.) • Explicatingthe embedded and often hidden ‘mechanisms’ at work within programmeinterventions • An additional and important argument in realistic evaluation and programme theory evaluation is that the ‘context’ of programmes matter. • A moderatoris a description of the context that activates the mechanism.
The case of ALMPs* Quantitative Impact/Outcome evaluations (using variable like registered unemployment, average earnings etc.) provide some evidence, as e.g. • job search assistance programmes were most likely to produce a positive impact • classroom and on-the-job training programmes had a relatively positive impact in the medium term (aft er two years), although in the short term these programmes oft en had an insignificant or negative impact • (macro) Impact of ALMPs on employment and unemployment rates. Th is approach typically relies on cross-country econometric analysis based on large panel data sets rather than individual programmeevaluations *based on Bredgaard