A stochastic dominance approach to program evaluation

A stochastic dominance approach to program evaluation Felix NascholdUniversity of Wyoming Christopher B. BarrettCornell University May 2012 seminar presentation University of Sydney And an application to child nutritional status in arid and semi-arid Kenya

Motivation • Program Evaluation Methods • By design they focus on mean Ex: “average treatment effect” (ATE) • In practice, often interested in broader distributional impact • Limited possibility for doing this by splitting sample • Stochastic dominance • By design, look at entire distribution • Now commonly used in snapshot welfare comparisons • But not for program evaluation. Ex: “differences-in-differences” • This paper merges the two  Diff-in-Diff (DD) evaluation using stochastic dominance (SD) to compare changes in distributions over time between intervention and control populations

Main Contributions • Proposes DD-based SD method for program evaluation • First application to evaluating welfare changes over time • Specific application to new dataset on changes in child nutrition in arid and semi-arid lands (ASAL) of Kenya • Unique, large dataset of 600,000+ observations collected by the Arid Lands Resource Management Project (ALRMP II) in Kenya • (One of) first to use Z-scores of Mid-upper arm circumference (MUAC)

Main Results • Methodology • (relatively) straight-forward extension of SD to dynamic context: static SD results carry over • Interpretation differs (as based on cdfs) • Only feasible up to second order SD • Empirical results • Child malnutrition in Kenyan ASALs remains dire • No average treatment effect of ALRMP expenditures • Differential impact with fewer negative changes in treatment sublocations • ALRMP a nutritional safety net?

Program evaluation (PE) methods • Fundamental problem of PE: want to but cannot observe a person’s outcomes in treatment and control state • Solution 1: make treatment and control look the same (randomization) • Gives average treatment effect as • Solution 2: compare changes across treatment and control (Difference-in-Difference) • Gives average treatment effect as:

New PE method based on SD • Objective: to look beyond the ‘average treatment effect’ • Approach: SD compares entire distributions not just their summary statistics • Two advantages • Circumvents (highly controversial) cut-off point Examples: poverty line, MUAC Z-score cut-off • Unifies analysis for broad classes of welfare indicators

Cumulative % of population FB(x) FA(x) xmax 0 MUAC Z-score Stochastic Dominance First order: A FOD B up to iff Sth order: A sth order dominates B iff

SD and single differences • These SD dominance criteria • Apply directly to single difference evaluation (across time OR across treatment and control groups) • Do not directly apply to DD • Literature to date: • Single paper: Verme (2010) on single differences • SD entirely absent from the program evaluation literature (e.g., Handbook of Development Economics)

Expanding SD to DD estimation - Method Practical importance: evaluate beyond-mean effect in non-experimental data Let , and G denote the set of probability density functions of Δ, with The respective cdfs of changes are GA(Δ) and GB(Δ) Then A FOD B iff A Sth order dominates B iff

Expanding SD to DD: interpretation differences 1. Cut-off point in terms of changes not levels. Cdf orders change from most negative to most positive  ‘initial poverty blind’ or ‘initial malnutrition blind’. (Partial) remedy: run on subset of ever-poor/always-poor 2. Interpretation of dominance orders FOD: differences in distributions of changes between intervention and control sublocations SOD: degree of concentration of these changes at lower end of distributions TOD: additional weight to lower end of distribution. Is there any value to doing this for welfare changes irrespective of absolute welfare? Probably not.

Setting and data • Arid and Semi-arid districts in Kenya • Characterized by pastoralism • Highest poverty incidences in Kenya, high infant mortality and malnutrition levels above emergency thresholds • Data • From Arid Lands Resource Management Project (ALRMP) Phase II • 28 districts, 128 sublocations, June 05- Aug 09, 602,000 child obs. • Welfare Indicator: MUAC Z-scores • Severe malnutrition in 2005/6: • Median child MUAC z-score -1.22/-1.12 (Intervention/Control) • 10 percent of children had Z-scores below -2.31/-2.14 (I/C) • 25 percent of children had Z-scores below -1.80/-1.67 (I/C)

The pseudo panel • Sublocation-specific pseudo panel 2005/06-2008/09 • Why pseudo-panel? • Inconsistent child identifiers • MUAC data not available for all children in all months • Graduation out of and birth into the sample • How? • 14 summary statistics for annual mean monthly sublocation-specific stats: mean & percentiles and ‘poverty measures’ • Focus on malnourished children • Thus, present analysis median MUAC Z-score of children z ≤ 0 • Control and intervention according to project investment

Results: DD Regression Pseudo panel regression model where D is the intervention dummy variable of interest NDVI is a control for agrometeorological conditions L are District fixed effects to control for unobservables within major jurisdictions No statistically significant average program impact

DD regression panel results Robust p-values in parentheses *** p<0.01, ** p<0.05, * p<0.1 District dummy variables included.

SD Results Three steps: • Steps 1 & 2: Simple differences • SD within control and treatment over time: No difference in trends. Both improved slightly. • SD control vs. treatment at beginning and at end: Control sublocations dominate in most cases, intervention never dominates. • Step 3: SD on Diff-in-Diff (results focus for today)

Expanding SD to DD –controlling for covariates • In regression Diff-in-Diff: simply add (linear) controls • In SD-DD need a two step method • Regress outcome variable on covariates • Use residuals (the unexplained variation) in SD-DD • In application below, use first stage controls for agro-meteorological conditions (as reflected in remotely-sensed vegetation measure, NDVI).

For (drought-adjusted) median MUAC z-scores: Below z=0.2, intervention sites FOD control sites, although not at 5% statistical significance level. ALRMP interventions appear moderately effective in preventing worsening nutritional status among children.

Similar results at other quantile breaks

Conclusions • Existing program evaluation approaches focus on estimating the average treatment effect. In some cases, that is not really the impact statistic of interest. • This paper introduces a new SD-based method to evaluate impact across entire distribution for non-experimental data • Results show the practical importance of looking beyond averages • Standard Diff-in-Diff regressions: no impact at the mean • SD DD: intervention locations had fewer negative observations and of smaller magnitude, especially median and below • ALRMP II may have functioned as nutritional safety net (though only correlation, there is no way to establish causality)

Thank you for your time,interest and comments

SD, poverty & social welfare orderings (1) 1. SD and Poverty orderings • Let SDs denote stochastic dominance of order s and Pα stand for poverty ordering (‘has less poverty’) • Let α=s-1 • Then A Pα B iff A SDs B • SD and Poverty orderings are nested • A SD1 B  A SD2 B  A SD3B • A P1 B  A P2 B  A P3 B

SD, poverty & social welfare orderings (2) 2. Poverty and Welfare orderings (Foster and Shorrocks 1988) • Let U(F) be the class of symmetric utilitarian welfare functions • Then A Pα B iff A Uα B • Examples: • U1 represents the monotonic utilitarian welfare functions such that u’>0. Less malnutrition is better, regardless for whom. • U2 represents equality preference welfare functions such that u’’<0. A mean preserving progressive transfer increases U2. • U3 represents transfer sensitive social welfare functions such that u’’’>0. A transfer is valued more lower in the distribution • Bottom line: For welfare levels tests up to third order make sense

The data (2) – extent of malnutrition

DD Regression 2 Individual MUAC Z-score regression To test program impact with much larger data set Still no statistically significant average program impact

Results – DD regression indiv data Robust p-values in parentheses *** p<0.01, ** p<0.05, * p<0.1 District dummy variables included.

Full SD results

A stochastic dominance approach to program evaluation