270 likes | 286 Views
Learn how to predict new comparable poverty estimates in Serbia by adapting poverty mapping methodology, addressing survey differences, and reconciling consumption definitions.
E N D
Establishing Comparable Poverty Estimates in Serbia (and elsewhere…) Jill Luoto January 25, 2007 Western Balkans Poverty Analysis Course: World Bank
Goals • Introduce an adaptation of the poverty mapping methodology that enables the prediction of new poverty estimates that are strictly comparable when otherwise incomparable welfare estimates exist • Present brief summary of findings for Serbia • Lead everyone in an exercise using the PovMap software on Serbian data
The Problem • Estimating the evolution of poverty in Serbia over recent years is complicated by a change in official surveys • Living Standards Measurement Survey (LSMS) implemented in 2002 and 2003 • Household Budget Survey (HBS) implemented 2003-2006 • The two survey instruments have different consumption modules. Some of the differences include: • LSMS included a list of item codes for consumption goods • HBS utilized open diary format • Different recall periods: 1 week in LSMS, 2 weeks in HBS • Different imputation procedures for housing rents and other expenditure items • All in all, many differences in the way consumption and resulting poverty were estimated across surveys
Different Consumption Definitions Lead to… • Incomparable Poverty Estimates • Lanjouw and Lanjouw (2001) offer real world examples where only slight changes in the definition of the consumption aggregate affect resulting poverty estimates dramatically • For Serbia, the different consumption modules between LSMS and HBS have caused policymakers to generally consider their respective poverty estimates not to be comparable • This leaves open the question as to what happened to poverty in Serbia between 2003 and 2005
Possible Solution: • Adaptation of the poverty-mapping methodology that aims to reconcile comparability of consumption definitions across surveys • Other components of LSMS and HBS collect similar information • Geographic information • Household demographics • Asset ownership • Education and Labor Information • Instead of imputing consumption definition from a survey into a census across space, impute from survey to survey across time • Necessarily ensures an identical definition of consumption across data sources • Implicit assumption that the relationship between consumption and its correlates remains stable over time
Methodology, In Brief • Establish the completely comparable components between surveys • Estimate a model of consumption in one survey using as explanatory variables only those correlates of consumption that are comparably defined across surveys • Take the point estimates from that model of consumption and impute them into the other survey to estimate new consumption figures using same set of explanatory variables • Derive new estimate of poverty using predicted consumption figures
Implementation • Gather all of the variables that collect similar information in LSMS and HBS (there are many…) • Generally 5 main categories for the types of information that are useful in describing a household’s welfare and commonly collected in surveys: • Geographic Information • Demographics • Education and Profession Variables • Asset Ownership/Wealth Indicators • Basic Health Information • Define new variable in each dataset that has same definition (and name) across datasets • Compare means, distributions of similar variables across surveys to ensure capturing same information
Finding common variables across surveys… HBS Questionnaire LSMS Questionnaire
Restoring Comparability to Education Variables My Definition: My variable definition matches exactly between surveys
Importing data into PovMap • We will be using subsets of pre-made Stata datasets from LSMS 2003 and HBS 2005 that have been matched and have identical variable names • Go to: FileNew ProjectName your project • Each dataset must have a hierarchical household-level identifying variable that can be truncated to identify the cluster • Example: HID=32601 Cluster=326
Stage 1: “Checker” Stage • Compare distributions of variables across datasets • If you think after this final stage of comparison that the variables are truly capturing the same information, “set” the variable to be included as a potential regressor • Since we’re imputing from one survey to another from different year, it’s important to keep in mind that some variables are going to change over time, e.g., % owning cell phones
Summary Statistics for Comparably Defined Variables: Geographic Variables
You can also compare the entire distributions of similar variables across data sources
Stage 2: Building a Consumption Model • You've chosen all of the potential explanatory variables after all phases of screening (comparing surveys, comparing distributions) and now you move on to building your model of consumption • Categorical variables are translated into a sequence of dummies • Build models stepwise or “intuitively” choose explanatory variables using OLS • Aim for highest R2 possible to best capture variation in household welfare levels • Simultaneity and Omitted Variables Bias are not important for our purposes
Estimate consumption on subset of variables comparably defined across surveys; aim for highest R2 Regression results from LSMS 2003
Stage 3: Cluster Effects • Decompose the error term into a cluster effect and an idiosyncratic household effect: • This stage deals with modeling the cluster effect • Since disturbance terms are likely to be correlated within clusters (due to unobserved geographic and other factors beyond those already included as regressors), this stage accounts for this by estimating a cluster random effect • If you click on the "no locational effect" button, you take away this cluster effect from your estimation • Underestimated standard errors
Stage 4: Idiosyncratic Model • Here, still using the base survey data, you are trying to model the heteroskedasticity of the household idiosyncratic effect to allow it a more flexible form • This stage tries to model the variance in the household-specific error terms as functions of the included X variables and combinations of variables • Can use stepwise modeling or basic OLS or any other method to choose the explanatory variables that best explain variation in the household idiosyncratic effect • Generally very low R2’s in this stage (it’s all unobserved variation); 0.01-0.02 is sufficient
Stage 5: Household Effects • Shows you a plot of the residuals from the model of the idiosyncratic household level error terms • The “Prediction Plot” generally shows that your predictions aren’t so great here • Empirical distribution of residuals can be compared to the normal and t-distributions (of varying degrees of freedom)
Stage 6: Simulation • Here, we need to simulate the residual terms (both the cluster effect and the household idiosyncratic effect) since they are necessarily unknown in the latter survey (or census) • Distributional forms: You can either impose a normal distribution or allow for a more flexible semi-parametric distributional form using information from the predicted residuals from base data • Choose the level of aggregation of your poverty estimates • Choose poverty line, household size variable, and poverty indicators of your choosing • Go!
Stage 7: Results • Compare your resulting estimates of poverty with the baseline estimates from first survey
Conclusions • For Serbia, this exercise suggests a gradual decline in poverty between 2003 and 2005 • Resulting poverty headcount estimate of 7.5% based on models of consumption from both LSMS 2002 and LSMS 2003 • Lower than official estimate of 9.1 for 2005 based on consumption module of HBS • Nearly 30% drop in poverty from LSMS 2003 headcount estimate of 10.5 if results are believed • This methodology can be used in a variety of settings to restore comparability of surveys to estimate evolution of poverty over time within a country or region • Download the PovMap software at: http://iresearch.worldbank.org/PovMap/index.htm