160 likes | 385 Views
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center. Steven Ruggles. Minnesota Population Center. Outline. History of Public Use Census Microdata IPUMS IPUMS-International NAPP Differences among the projects Data format
E N D
Building Historical Social Science Infrastructure:Data Integration Projects of the Minnesota Population Center Steven Ruggles Minnesota Population Center
Outline • History of Public Use Census Microdata • IPUMS • IPUMS-International • NAPP • Differences among the projects • Data format • Harmonization • Administration, work processes, and legal constraints
History of U.S. Public Use Census Microdata • The 1960 One-In-One-Thousand Public Use Sample • The 1970 Public Use Samples • DUALabs, Beresford, and the harmonized and expanded 1960 sample • The new historical samples: Preston, Winsborough, Ruggles • The 1980, 1990, and 2000 PUMS: incompatible
1991: eight census years, four investigators, six performance sites, seven record layouts Table 1. Census files incorporated in the original version of IPUMS
IPUMS • 1987-1992: SHRL Common format FORTRAN programs • Limitations: lost information, false cognates, poor documentation, expensive custom datasets • IPUMS was an attempt to do it right • Single harmonized database, comprehensive integrated documentation, no lost information • Beta release 1993, full public release 1995 • Internet dissemination • ftp in 1993, web-based interactive extraction in 1995
Table 2. Current and Planned IPUMS-USA Data Files Table 2. Current and Planned IPUMS-USA Data Files
IPUMS-International • After 1960, most censuses around the world were tabulated by computer • McCaa decided that IPUMS model should be applied to other countries • Began with a project for Columbia, then in 1999 NSF Infrastructure grant to add six more countries • 2003-2005: three major new grants to increase database to 50+ countries
IPUMS-International Tasks • Inventory and preservation of data and documentation • Processing (standardizing format, correcting format errors, drawing samples, adding confidentiality protections, harmonizing codes, etc.) • Documentation (especially comparability) • Dissemination—obtain licenses that allow us to disseminate data for educational and scholarly usae, and set up secure web-based dissemination system
North Atlantic Population Project • IMAG 1999: LDS data for Britain, Canada, U.S. • Minneapolis 2000: meetings to define scope of a harmonization project • Added Norway and Iceland • Adopted decentralized structure with coding work carried out at seven sites, coordination and programming at Minneapolis • 2003-2005: preliminary datasets for all countries released • 2006-2009: planned expansions (funding pending)
Differences • Data Format Problems • Harmonization • Project administration and work process • Ownership and dissemination restrictions
Merging the databases • Current compatibility and incompatibilities • Two formats • Integration of web access tools
Thank you. • http://ipums.org/usa • http://ipums.org/international • http://nappdata.org