150 likes | 238 Views
Creating a collection of standardized datasets on household consumption. Olivier Dupriez World Bank, Development Data Group odupriez@worldbank.org 6 June 2013. Initial objective. Calculate poverty PPPs
E N D
Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group odupriez@worldbank.org 6 June 2013
Initial objective • Calculate poverty PPPs • Had price data at basic heading level from the ICP ; needed consumption shares “at the poverty line” for the same breakdown to be used as weights. • See: A. Deaton and O. Dupriez,Purchasing power parity exchange rates for the global poor, American Economic Journal: Applied, vol. 3, pp. 137-166 (2011), and also Global Poverty and Global Price Indexes
Intermediary output – data files • A collection of “standard” files • Individual level: age, sex • Household level: region, total expenditure (before and after fixing outliers), adult equivalents, hhld size, etc • Household + product level: • Product code (original as in questionnaire, with labels) and COICOP code • Value purchased, home produced, received, total • Deflated (when available) / non deflated • NO information on quantities • Format/structure of the data files is standard; content not so much
Multiple uses and users • Many potential applications • IFC “Business Opportunities at the Base of the Pyramid” • Micro-macro modeling • Poverty/inequality analysis • Assessment of reliability and relevance of surveys • E.g., list all items related to health with percentage of respondents, for each survey • E,g, list all categories not covered by questionnaires • And many more
Method • Use household consumption/expenditure surveys • A VERY divers set of surveys (HBS, LSMS, HIES, etc) • Ex-post harmonization has limits • Map all products and services to COICOP • From 6000+ items in Brazil survey to less than 50 in other countries… • Annualize values by product/service and hhld • Fix outliers • No attempt to fill gaps (no imputation of values for missing products/services) • Generate the 3 standard files
Principle – Full replicability • One single Stata program per survey • Calls one “generic” program to detect and fix outliers • Controlled vocabulary for file names, folder names • Survey ID to link to on-line metadata catalog
Mapping to COICOP • ICP/COICOP: 110 basic headings for household consumption • 105 are relevant for household surveys • Situations: • Many to one (e.g., long list of vegetables) • One to one • One to many (lack of detail in questionnaire) • No data to one (questionnaire missed items)
Grouped categories • One to many: items in questionnaires are not always detailed enough to be mapped to one single COICOP basic heading
Missing categories • No questionnaire found to cover all 105 categories of products and services • On average, N basic headings missing • Sometimes for know reasons (e.g., pork in muslim countries) • But questionnaire design needs improvement in all countries
Splitting grouped categories • Used breakdown from national accounts to split grouped categories (data obtained from ICP)
Correlation between SNA and surveys • From almost perfect (very few cases) to very low (many countries)
Annualization challenges • Some problematic items: • Durables (use value/expenditure) • Imputed rents • Out of pocket health expenditure • Ceremonies, etc. • Food away from home • Validation: compare with official estimates when available, and with PovCal aggregates • Never replicate exactly
Detecting and fixing outliers • Top outliers only • Tried multiple options • Based on per capita or per household depending on item • 75th percentile + 5 times interquartile range • Replace with maximum valid value (zero values not included in calculations) • If outlier for multiple items, consider “rich” household and do not fix • Would deserve a specific research project
Outliers fixing – Significant impact • Example: change in Ginishttp://datavizint.worldbank.org/t/DECDG/views/GiniAnalyses/Ginis?:embed=y&:display_count=no
Past and future • 160 datasets “standardized” – 90+ low and middle-income countries • Many more survey datasets available at WB; could expand and update the collection if resources are available • Conduct in-depth research work on outliers and formulate recommendations to countries • Feedback to countries on issues in questionnaire design • Dissemination of microdata?