1 / 15

Creating a collection of standardized datasets on household consumption

Creating a collection of standardized datasets on household consumption. Olivier Dupriez World Bank, Development Data Group odupriez@worldbank.org 6 June 2013. Initial objective. Calculate poverty PPPs

deanne
Download Presentation

Creating a collection of standardized datasets on household consumption

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group odupriez@worldbank.org 6 June 2013

  2. Initial objective • Calculate poverty PPPs • Had price data at basic heading level from the ICP ; needed consumption shares “at the poverty line” for the same breakdown to be used as weights. • See: A. Deaton and O. Dupriez,Purchasing power parity exchange rates for the global poor, American Economic Journal: Applied, vol. 3, pp. 137-166 (2011), and also Global Poverty and Global Price Indexes

  3. Intermediary output – data files • A collection of “standard” files • Individual level: age, sex • Household level: region, total expenditure (before and after fixing outliers), adult equivalents, hhld size, etc • Household + product level: • Product code (original as in questionnaire, with labels) and COICOP code • Value purchased, home produced, received, total • Deflated (when available) / non deflated • NO information on quantities • Format/structure of the data files is standard; content not so much

  4. Multiple uses and users • Many potential applications • IFC “Business Opportunities at the Base of the Pyramid” • Micro-macro modeling • Poverty/inequality analysis • Assessment of reliability and relevance of surveys • E.g., list all items related to health with percentage of respondents, for each survey • E,g, list all categories not covered by questionnaires • And many more

  5. Method • Use household consumption/expenditure surveys • A VERY divers set of surveys (HBS, LSMS, HIES, etc) • Ex-post harmonization has limits • Map all products and services to COICOP • From 6000+ items in Brazil survey to less than 50 in other countries… • Annualize values by product/service and hhld • Fix outliers • No attempt to fill gaps (no imputation of values for missing products/services) • Generate the 3 standard files

  6. Principle – Full replicability • One single Stata program per survey • Calls one “generic” program to detect and fix outliers • Controlled vocabulary for file names, folder names • Survey ID to link to on-line metadata catalog

  7. Mapping to COICOP • ICP/COICOP: 110 basic headings for household consumption • 105 are relevant for household surveys • Situations: • Many to one (e.g., long list of vegetables) • One to one • One to many (lack of detail in questionnaire) • No data to one (questionnaire missed items)

  8. Grouped categories • One to many: items in questionnaires are not always detailed enough to be mapped to one single COICOP basic heading

  9. Missing categories • No questionnaire found to cover all 105 categories of products and services • On average, N basic headings missing • Sometimes for know reasons (e.g., pork in muslim countries) • But questionnaire design needs improvement in all countries

  10. Splitting grouped categories • Used breakdown from national accounts to split grouped categories (data obtained from ICP)

  11. Correlation between SNA and surveys • From almost perfect (very few cases) to very low (many countries)

  12. Annualization challenges • Some problematic items: • Durables (use value/expenditure) • Imputed rents • Out of pocket health expenditure • Ceremonies, etc. • Food away from home • Validation: compare with official estimates when available, and with PovCal aggregates • Never replicate exactly

  13. Detecting and fixing outliers • Top outliers only • Tried multiple options • Based on per capita or per household depending on item • 75th percentile + 5 times interquartile range • Replace with maximum valid value (zero values not included in calculations) • If outlier for multiple items, consider “rich” household and do not fix • Would deserve a specific research project

  14. Outliers fixing – Significant impact • Example: change in Ginishttp://datavizint.worldbank.org/t/DECDG/views/GiniAnalyses/Ginis?:embed=y&:display_count=no

  15. Past and future • 160 datasets “standardized” – 90+ low and middle-income countries • Many more survey datasets available at WB; could expand and update the collection if resources are available • Conduct in-depth research work on outliers and formulate recommendations to countries • Feedback to countries on issues in questionnaire design • Dissemination of microdata?

More Related