Creating a collection of standardized datasets on household consumption

Creating a collection of standardized datasets on household consumption Olivier Dupriez World Bank, Development Data Group odupriez@worldbank.org 6 June 2013

Initial objective • Calculate poverty PPPs • Had price data at basic heading level from the ICP ; needed consumption shares “at the poverty line” for the same breakdown to be used as weights. • See: A. Deaton and O. Dupriez,Purchasing power parity exchange rates for the global poor, American Economic Journal: Applied, vol. 3, pp. 137-166 (2011), and also Global Poverty and Global Price Indexes

Intermediary output – data files • A collection of “standard” files • Individual level: age, sex • Household level: region, total expenditure (before and after fixing outliers), adult equivalents, hhld size, etc • Household + product level: • Product code (original as in questionnaire, with labels) and COICOP code • Value purchased, home produced, received, total • Deflated (when available) / non deflated • NO information on quantities • Format/structure of the data files is standard; content not so much

Multiple uses and users • Many potential applications • IFC “Business Opportunities at the Base of the Pyramid” • Micro-macro modeling • Poverty/inequality analysis • Assessment of reliability and relevance of surveys • E.g., list all items related to health with percentage of respondents, for each survey • E,g, list all categories not covered by questionnaires • And many more

Method • Use household consumption/expenditure surveys • A VERY divers set of surveys (HBS, LSMS, HIES, etc) • Ex-post harmonization has limits • Map all products and services to COICOP • From 6000+ items in Brazil survey to less than 50 in other countries… • Annualize values by product/service and hhld • Fix outliers • No attempt to fill gaps (no imputation of values for missing products/services) • Generate the 3 standard files

Principle – Full replicability • One single Stata program per survey • Calls one “generic” program to detect and fix outliers • Controlled vocabulary for file names, folder names • Survey ID to link to on-line metadata catalog

Mapping to COICOP • ICP/COICOP: 110 basic headings for household consumption • 105 are relevant for household surveys • Situations: • Many to one (e.g., long list of vegetables) • One to one • One to many (lack of detail in questionnaire) • No data to one (questionnaire missed items)

Grouped categories • One to many: items in questionnaires are not always detailed enough to be mapped to one single COICOP basic heading

Missing categories • No questionnaire found to cover all 105 categories of products and services • On average, N basic headings missing • Sometimes for know reasons (e.g., pork in muslim countries) • But questionnaire design needs improvement in all countries

Splitting grouped categories • Used breakdown from national accounts to split grouped categories (data obtained from ICP)

Correlation between SNA and surveys • From almost perfect (very few cases) to very low (many countries)

Annualization challenges • Some problematic items: • Durables (use value/expenditure) • Imputed rents • Out of pocket health expenditure • Ceremonies, etc. • Food away from home • Validation: compare with official estimates when available, and with PovCal aggregates • Never replicate exactly

Detecting and fixing outliers • Top outliers only • Tried multiple options • Based on per capita or per household depending on item • 75th percentile + 5 times interquartile range • Replace with maximum valid value (zero values not included in calculations) • If outlier for multiple items, consider “rich” household and do not fix • Would deserve a specific research project

Outliers fixing – Significant impact • Example: change in Ginishttp://datavizint.worldbank.org/t/DECDG/views/GiniAnalyses/Ginis?:embed=y&:display_count=no

Past and future • 160 datasets “standardized” – 90+ low and middle-income countries • Many more survey datasets available at WB; could expand and update the collection if resources are available • Conduct in-depth research work on outliers and formulate recommendations to countries • Feedback to countries on issues in questionnaire design • Dissemination of microdata?

Creating a collection of standardized datasets on household consumption

Creating a collection of standardized datasets on household consumption

Presentation Transcript

A History of Standardized Testing

Plans for a euro area survey on household finance and consumption

Creating and Editing Feature Datasets

Income Distribution and Household Consumption

Creating and Modifying Datasets in SAS

US Household Consumption

Plans for a euro area survey on household finance and consumption *

Deriving Food Security Indicators from a Household Consumption Module

Sharing and Communication around Household Energy Consumption

Survey Framework Session III – Implementation of Price Survey on Household Consumption products

Door-to-Door On-call Household Hazardous Waste Collection Program

Household Water Consumption

Household Consumption Expenditure

Household Hazardous Waste Collection Volunteer Training

Having genome data allows collection of other ‘ omic ’ datasets

The Persistence Of Macroeconomic Shock Effects on Russian Household Consumption

The persistence of macroeconomic shock effects on Russian household consumption

Celebrating Diversity Creating a Tree Collection

Household Hazardous Waste Collection Volunteer Training

Estimation of household spending on education using household surveys

The Persistence Of Macroeconomic Shock Effects on Russian Household Consumption