1 / 18

Robert McCaa and Steven Ruggles

Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center. Robert McCaa and Steven Ruggles. Minnesota Population Center. 2. Make and submit extract. 1. Access web-site study documentation. 3. Get email: extract ready.

shavere
Download Presentation

Robert McCaa and Steven Ruggles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Historical Social Science Infrastructure:Data Integration Projects of the Minnesota Population Center Robert McCaa and Steven Ruggles Minnesota Population Center

  2. 2. Make and submit extract 1. Access web-sitestudy documentation 3. Get email: extract ready 5. Decompress extract 4. Retrieve extract 6. Analyze using stat. package How to get data (once approved)https://www.ipums.org/international (also SAS, STATA)

  3. Outline • History of Public Use Census Microdata • IPUMS • IPUMS-International • NAPP • Differences among the projects • Data format • Harmonization • Administration, work processes, and legal constraints

  4. History of U.S. Public Use Census Microdata • The 1960 One-In-One-Thousand Public Use Sample • The 1970 Public Use Samples • DUALabs, Beresford, and the harmonized and expanded 1960 sample • The new historical samples: Preston, Winsborough, Ruggles • The 1980, 1990, and 2000 PUMS: incompatible

  5. 1991: eight census years, four investigators, six performance sites, seven record layouts Table 1. Census files incorporated in the original version of IPUMS

  6. IPUMS • 1987-1992: SHRL Common format FORTRAN programs • Limitations: lost information, false cognates, poor documentation, expensive custom datasets • IPUMS was an attempt to do it right • Single harmonized database, comprehensive integrated documentation, no lost information • Beta release 1993, full public release 1995 • Internet dissemination • ftp in 1993, web-based interactive extraction in 1995

  7. Table 2. Current and Planned IPUMS-USA Data Files Table 2. Current and Planned IPUMS-USA Data Files

  8. IPUMS-International • After 1960, most censuses around the world were tabulated by computer • McCaa decided that IPUMS model should be applied to other countries • Began with a project for Colombia, then in 1999 NSF Infrastructure grant to add six more countries • 2003-2005: three major new grants to increase database to 50+ countries

  9. IPUMS-International Tasks • Inventory and preservation of data and documentation • Processing (standardizing format, correcting format errors, drawing samples, adding confidentiality protections, harmonizing codes, etc.) • Documentation (especially comparability) • Dissemination—obtain licenses that allow us to disseminate data for educational and scholarly usae, and set up secure web-based dissemination system

  10. Table 3. Current IPUMS-International Samples

  11. IPUMS-International, August 2005dark green = disseminatingmedium green = harmonizinglight green = negotiating 55% world's population Mollenweide projection

  12. Table 4. Status of IPUMS-International Countries

  13. North Atlantic Population Project • IMAG 1999: LDS data for Britain, Canada, U.S. • Minneapolis 2000: meetings to define scope of a harmonization project • Added Norway and Iceland • Adopted decentralized structure with coding work carried out at seven sites, coordination and programming at Minneapolis • 2003-2005: preliminary datasets for all countries released • 2006-2009: planned expansions (funding pending)

  14. Table 5a. Phase I NAPP datasets

  15. Table 5b. Phase II NAPP datasets

  16. Differences • Data Format Problems • Harmonization • Project administration and work process • Ownership and dissemination restrictions

  17. Merging the databases • Current compatibility and incompatibilities • Two formats • Integration of web access tools

  18. Thank you. • http://ipums.org/usa • https://ipums.org/international • http://nappdata.org

More Related