890 likes | 1.03k Views
The Minnesota Data Harmonization Projects. Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesota sobek@umn.edu. Integrated Public Use Microdata Series. Minnesota Population Center.
E N D
The Minnesota Data Harmonization Projects Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesotasobek@umn.edu
Minnesota Population Center • We build data infrastructure for research community. Specialize in data harmonization. • World’s largest collection of individual population and health data, across 9 projects. • 50,000 registered users from over 100 countries. • Free
MPC Data Dissemination, 1993-2012 Gigabytes per week
The Problem • Combining data from multiple sources is time consuming • Discovery • Data management • It’s error prone • Recoding data • Overlook documentation • Hard to replicate results • Discourages comparative research
Outline • Harmonization methods • Dissemination system • International projects • Integrated DHS • Terra Populus • IPUMS-International
Terminology Harmonization: Combining datasets collected at different times or places into a single, consistent data series. “Integration” Metadata: Data about data. Documentation in broadest sense.
Relation to head Marital status Occupation Microdata Education
Harmonization Methods • Metadata • Data • Dissemination
Systematize Metadata (record layout file, pdf)
Convert Questionnaires to Metadata (Mexico 2000) Water Access
XML-Tagged Questionnaire Text Bedrooms Rooms Water access
Data: Variable Harmonization Marital Status: IPUMS-International Bangladesh 2011 Mexico 1970 Kenya 1999 1 = Unmarried 2 = Married 3 = Widowed 4 = Divorced/separated 1 = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single 1 = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated
Translation Table Input Bangladesh 2011 Mexico 1970 Kenya 1999 1 = Married, civil & relig 1 = Never married 1 = Unmarried 2 = Monogamous 2 = Married 2 = Married, civil 3 = Polygamous 3 = Widowed 3 = Married, religious 4 = Consensual union 4 = Divrc or separated 4 = Widowed 5 = Widowed 5 = Divorced 6 = Separated 6 = Divorced 7 = Separated 8 = Single
Translation Table Harmonized Input Bangladesh 2011 Mexico 1970 Kenya 1999 Code Label 1 0 0 Single 1 = Married, civil & relig 1 = Never married 1 = Unmarried 2 0 0 2 = Monogamous 2 = Married 2 = Married, civil Married or in union 2 1 0 3 = Polygamous Married, formally 3 = Widowed 3 = Married, religious 2 1 1 4 = Consensual union Civil 4 = Divrc or separated 4 = Widowed 2 1 2 5 = Widowed 5 = Divorced Religious 2 1 3 6 = Separated Civil and religious 6 = Divorced 2 1 4 Monogamous 7 = Separated 2 1 5 8 = Single Polygamous 2 2 0 Consensual union Divorced or separated 3 0 0 3 1 0 Separated Divorced 3 2 0 4 0 0 Widowed
Translation Table Harmonized Input Bangladesh 2011 Mexico 1970 Kenya 1999 Code Label 1 0 0 Single 1 = Never married 1 = Unmarried 8 = Single 2 0 0 2 = Married Married or in union 2 1 0 Married, formally 2 1 1 2 = Married, civil Civil 2 1 2 3 = Married, religious Religious 2 1 3 1 = Married, civil & relig Civil and religious 2 1 4 Monogamous 2 = Monogamous 2 1 5 3 = Polygamous Polygamous 2 2 0 4 = Consensual union Consensual union Divorced or separated 3 0 0 4 = Divrc or separated 7 = Separated 6 = Separated 3 1 0 Separated 6 = Divorced 5 = Divorced Divorced 3 2 0 3 = Widowed 4 0 0 Widowed 5 = Widowed 4 = Widowed
Variables Page 238 censuses
Variable Codes (Marital status)
Variable Codes (Marital status)
Variable Codes (Marital status)
Variable Comparability Discussion (Marital status)
Questionnaire Text (Marital status, Cambodia)
Attached Characteristics Age of spouse Employment status of father Occupation of father
Demographic and Health Surveys • Foremost source of health information for the developing world • Funded by USAID • Since 1980s, over 300 surveys, 90 countries • Topics: fertility, nutrition, HIV, malaria, maternal and child health, etc
IDHS Project • 5-year NIH grant (end of year 2) • Focus on Africa, with India • Partnership with ICF-International and USAID
Why an Integrated DHS? Motivation: DHS is incredibly valuable, but it’s hard to capitalize on its full potential. Problem: • Data discovery • Dispersed documentation • Data management • Variable changes over time Not unique to DHS: endemic to any survey that’s persisted over decades.
DHS Research Process Example: Find data on female genital cutting Survey Search Tool