300 likes | 310 Views
Access the world's largest public-use population database with data from 44+ countries, enabling extensive analyses across time and space for scholarly research.
E N D
IPUMS-International Steven Ruggles Minnesota Population Center
What is IPUMS-International? • The IPUMS-International project is creating an integrated global database of over 150 censuses from at least 44 countries. • It will be the world’s largest public-use population database, with multiple samples from each country enabling analyses across time and space. • The microdata and accompanying documentation will be freely available for scholarly and educational research through a web-based data dissemination system.
The Problem • A vast body of raw census microdata covering much of the world over the past four decades survives in machine-readable form. • In most countries, these census microdata are either unavailable to researchers or difficult to obtain. • These data are at constant risk of destruction because of technological obsolescence, physical aging of computer tapes, and loss of institutional memory and documentation
Why it matters • In the few countries where census microdata are readily available to researchers, they have become an indispensable part of social science infrastructure. • In the journal Demography, the leading U.S. journal of population, census microdata are used three times as often as any other source for studies of the U.S. or Canada. • No alternate source offers comparable sample sizes, chronological depth, or widespread availability across countries.
Large Many more cases than any alternative datasets Enable study of relatively small populations Allows analysis of effects of local conditions on behavior • Long-term Data usually available for multiple decades • Flexible Tabulations can be customized to research problem Multivariate analysis feasible Harmonization is possible, allowing analyses that cross borders and time periods Advantages of Census Microdata Samples
Cross-National Harmonization and Open Access:National Academy of Science recommendations • “National and international funding agencies should establish mechanisms that facilitate the harmonization of data collected in different countries.” • “Cross national studies conducted within a framework of comparable measurement can be a substantially more useful tool for policy analysis than studies of single countries.” • “The scientific community, broadly construed, should have widespread and unconstrained access to the data.” Source: Preparing for an Aging World: The Case for Cross-National Research (National Academy, 2001)
The Model: IPUMS-USA • Project to harmonize U.S. Census microdata for the period 1850-2000 • 1992-1995: NSF-funded IPUMS project harmonized samples using composite codes, documented comparability; 250,000 transformations, 3,000 pages of printed documentation • 1995-1999: Another NSF project funded an online data access system with integrated hypertext documentation
Success of IPUMS-USA User friendly access, harmonized codes, and integrated comprehensive hypertext documentation led to flood of historical census-based research: • 12,000 users, 75,000 custom data extracts • Currently distributing an average of 638 MB/hr, 24/7 • 1,300 publications and working papers • IPUMS-based research is concentrated in the top U.S. journals: the most common venues are Demography, American Economic Review, Journal of Political Economy, American Sociological Review, Social Forces, and Quarterly Review of Economics
IPUMS-International • After 1960, most censuses around the world were tabulated by computer • McCaa decided that IPUMS model should be applied to other countries • Began with a project for Columbia, then in 1999 NSF Infrastructure grant to add six more countries • 2005-2009: new HSD grant to increase database to 44 countries • NICHD is also assisting with funding
IPUMS-International Users • Prospective users must sign confidentiality agreement and provide an abstract explaining need for the data • Through 9/1/05 we had 980 applicants to use the database, of which 582 were approved (59 percent) • Users represent 40 countries and 250 institutions, including many international organizations (e.g., ILO, WHO, World Bank, Inter-American Development Fund)
Early results National Academy of Sciences panel (2005) used data from Colombia, Kenya, Mexico, and Vietnam to analyze changing outcomes such as schooling, work, fertility, and marriage as a function of age, gender, and household characteristics.
Early results Cynthia Feliciano (2005) compared the education of immigrants to the United States with those who remained behind to understand patterns of selectivity
Other topics include: • Changing living arrangements of the aged • Concentration of mortality within families • Impact of rainfall on health and economic welfare • Female labor-force participation and educational attainment • Regional inequality differentials • Brain drain from developing countries • Effects of emigration on labor markets • Relationship between divorce and family composition • Relationship between disease factors and education • Relationship between educational attainment and cohort size. • Effect of NAFTA on educational attainment and school enrollment by region within Mexico
Most users request multiple countries Number of countries requested by IPUMS-International users (percent distribution)
IPUMS-International Tasks • Inventory and preservation of data and documentation • Processing • Documentation (especially comparability) • Dissemination—obtain licenses that allow us to disseminate data for educational and scholarly use, and set up secure web-based dissemination system
IPUMS-International Tasks • Inventory and preservation of data and documentation • Processing • Documentation (especially comparability) • Dissemination—obtain licenses that allow us to disseminate data for educational and scholarly use, and set up secure web-based dissemination system
IPUMS-International Preservation Initiatives UN Demographic Center for Latin America (CELADE, Santiago, Chile)~3000 microdata tapes recovered and metadata (documentation)
Status of Data Acquisition dark green = disseminating medium green = data received light green = negotiating
Current IPUMS-International Partners Current funding for 44 countries by 2009 Next data release late spring 2006
Current IPUMS-International Partners Current funding for 44 countries by 2009 Next data release late spring 2006
IPUMS-International Tasks • Inventory and preservation of data and documentation • Processing • Documentation (especially comparability) • Dissemination—obtain licenses that allow us to disseminate data for educational and scholarly use, and set up secure web-based dissemination system
Processing • Standardize format • Correct format errors • Draw samples • Add confidentiality protections • Harmonize codes • Edit and allocate missing or inconsistent data • Add standard constructed variables
1 1 1 1 Constructed Variables: IPUMS Family Interrelationship Pointers Spouse’s Mother’s Father’s 0 0 0 0 0 0 0 0 0 6 0 5 0 0 0 5 6 0 5 6 0 0 0 0 9 0 0 9 0
IPUMS-International Tasks • Inventory and preservation of data and documentation • Processing • Documentation (especially comparability) • Dissemination—obtain licenses that allow us to disseminate data for educational and scholarly use, and set up secure web-based dissemination system
Documentation • Translate codebooks, enumeration forms, and enumeration instructions into English • Standardize format and add xml tags • Write documentation identifying comparability problems across countries, and within countries, across time periods • Assemble and scan ancillarydocumentation (e.g. census maps, post-enumeration survey results, and additional information on post-enumeration processing).
IPUMS-International Tasks • Inventory and preservation of data and documentation • Processing • Documentation (especially comparability) • Dissemination—obtain licenses that allow us to disseminate data for educational and scholarly use, and set up secure web-based dissemination system
Dissemination • Uniform perpetual agreements with national statistical agencies allows us to disseminate anonymized microdata to researchers who agree to a web-based confidentiality agreement • MPC staff assess research proposals for feasibility • Disputes with agencies, if they arise, will be settled by the International Court of Arbitration in Paris • Data dissemination occurs exclusively through the IPUMS-International web-based data access system