1 / 89

The Minnesota Data Harmonization Projects

The Minnesota Data Harmonization Projects. Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesota sobek@umn.edu. Integrated Public Use Microdata Series. Minnesota Population Center.

varian
Download Presentation

The Minnesota Data Harmonization Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Minnesota Data Harmonization Projects Bill & Melinda Gates Foundation Seattle, Washington May 21, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesotasobek@umn.edu

  2. Integrated Public Use Microdata Series

  3. Minnesota Population Center • We build data infrastructure for research community. Specialize in data harmonization. • World’s largest collection of individual population and health data, across 9 projects. • 50,000 registered users from over 100 countries. • Free

  4. MPC Data Dissemination, 1993-2012 Gigabytes per week

  5. MPC Data Projects

  6. The Problem • Combining data from multiple sources is time consuming • Discovery • Data management • It’s error prone • Recoding data • Overlook documentation • Hard to replicate results • Discourages comparative research

  7. Outline • Harmonization methods • Dissemination system • International projects • Integrated DHS • Terra Populus • IPUMS-International

  8. Terminology Harmonization: Combining datasets collected at different times or places into a single, consistent data series. “Integration” Metadata: Data about data. Documentation in broadest sense.

  9. Relation to head Marital status Occupation Microdata Education

  10. Summary Data

  11. Harmonization Methods • Metadata • Data • Dissemination

  12. Systematize Metadata (record layout file, pdf)

  13. MPC Data Dictionary

  14. Convert Questionnaires to Metadata (Mexico 2000) Water Access

  15. Metadata: Questionnaire Text

  16. XML-Tagged Questionnaire Text Bedrooms Rooms Water access

  17. Data: Variable Harmonization Marital Status: IPUMS-International Bangladesh 2011 Mexico 1970 Kenya 1999 1 = Unmarried 2 = Married 3 = Widowed 4 = Divorced/separated 1 = Married, civil & relig 2 = Married, civil 3 = Married, religious 4 = Consensual union 5 = Widowed 6 = Divorced 7 = Separated 8 = Single 1 = Never married 2 = Monogamous 3 = Polygamous 4 = Widowed 5 = Divorced 6 = Separated

  18. Translation Table Input Bangladesh 2011 Mexico 1970 Kenya 1999 1 = Married, civil & relig 1 = Never married 1 = Unmarried 2 = Monogamous 2 = Married 2 = Married, civil 3 = Polygamous 3 = Widowed 3 = Married, religious 4 = Consensual union 4 = Divrc or separated 4 = Widowed 5 = Widowed 5 = Divorced 6 = Separated 6 = Divorced 7 = Separated 8 = Single

  19. Translation Table Harmonized Input Bangladesh 2011 Mexico 1970 Kenya 1999 Code Label 1 0 0 Single 1 = Married, civil & relig 1 = Never married 1 = Unmarried 2 0 0 2 = Monogamous 2 = Married 2 = Married, civil Married or in union 2 1 0 3 = Polygamous Married, formally 3 = Widowed 3 = Married, religious 2 1 1 4 = Consensual union Civil 4 = Divrc or separated 4 = Widowed 2 1 2 5 = Widowed 5 = Divorced Religious 2 1 3 6 = Separated Civil and religious 6 = Divorced 2 1 4 Monogamous 7 = Separated 2 1 5 8 = Single Polygamous 2 2 0 Consensual union Divorced or separated 3 0 0 3 1 0 Separated Divorced 3 2 0 4 0 0 Widowed

  20. Translation Table Harmonized Input Bangladesh 2011 Mexico 1970 Kenya 1999 Code Label 1 0 0 Single 1 = Never married 1 = Unmarried 8 = Single 2 0 0 2 = Married Married or in union 2 1 0 Married, formally 2 1 1 2 = Married, civil Civil 2 1 2 3 = Married, religious Religious 2 1 3 1 = Married, civil & relig Civil and religious 2 1 4 Monogamous 2 = Monogamous 2 1 5 3 = Polygamous Polygamous 2 2 0 4 = Consensual union Consensual union Divorced or separated 3 0 0 4 = Divrc or separated 7 = Separated 6 = Separated 3 1 0 Separated 6 = Divorced 5 = Divorced Divorced 3 2 0 3 = Widowed 4 0 0 Widowed 5 = Widowed 4 = Widowed

  21. Data Dissemination System

  22. Data Dissemination System

  23. Variables Page

  24. Variables Page 238 censuses

  25. Sample Filtering

  26. Variables Page – Filtered

  27. Variable Page: Marital Status

  28. Variable Codes (Marital status)

  29. Variable Codes (Marital status)

  30. Variable Codes (Marital status)

  31. Variable Page: Marital Status

  32. Variable Comparability Discussion (Marital status)

  33. Variable Page: Documentation

  34. Questionnaire Text

  35. Questionnaire Text (Marital status, Cambodia)

  36. Variables Page

  37. Extract Summary

  38. Case Selection

  39. Attached Characteristics Age of spouse Employment status of father Occupation of father

  40. Extract Summary

  41. Download or Revise Extract

  42. On-line Analysis

  43. The International Projects

  44. Integrated DHS

  45. Demographic and Health Surveys • Foremost source of health information for the developing world • Funded by USAID • Since 1980s, over 300 surveys, 90 countries • Topics: fertility, nutrition, HIV, malaria, maternal and child health, etc

  46. IDHS Project • 5-year NIH grant (end of year 2) • Focus on Africa, with India • Partnership with ICF-International and USAID

  47. Why an Integrated DHS? Motivation: DHS is incredibly valuable, but it’s hard to capitalize on its full potential. Problem: • Data discovery • Dispersed documentation • Data management • Variable changes over time Not unique to DHS: endemic to any survey that’s persisted over decades.

  48. DHS Research Process Example: Find data on female genital cutting Survey Search Tool

More Related