260 likes | 420 Views
IPUMS-International: A Restricted Access Web-Site Providing Anonymized, Integrated Census Microdata for Social Science and Policy Research * * * Robert McCaa, Steven Ruggles, Matt Sobek (University of Minnesota) and Albert Esteve (Centre d’Estudis Demografics).
E N D
IPUMS-International: A Restricted Access Web-Site Providing Anonymized, Integrated Census Microdata for Social Science and Policy Research* * *Robert McCaa, Steven Ruggles, Matt Sobek (University of Minnesota) and Albert Esteve (Centre d’Estudis Demografics) www.ipums.org/international 54th ISI, Berlin 2003
Overview: access, privacy and confidentiality for integrated census microdata of 40+ countries Goals and accomplishments Confidentiality and privacy protections: legal, administrative, technical Data: cleaning, constructing, and integration Access: custom-tailored extracts (not whole datasets); users and uses Summary: new directions, 6 strengths and an aspiration www.ipums.org/international 54th ISI, Berlin 2003
1. Goals and accomplishments www.ipums.org/international 54th ISI, Berlin 2003
Four Goals: • 1. Inventory the world’s census microdata • 2. Preserve endangered microdataa. contract preservation with repositoriesb. deposit copies in at least two archives: National Statistical Organization and ... WHO… * * * • 3. Integrate datasets of authorized countries using UNSD and other standards • 4. Disseminate extracts of database to approved researchers without charge (copy to each NSI) www.ipums.org/international 54th ISI, Berlin 2003
Minnesota Population CenterUniversity of MinnesotaPrincipal investigators:historians: Steven Ruggles Robert McCaawww.ipums.org/international 1998 First agreement signed 1999 Funding authorized 2002 First data release, 7 countries: China, Colombia, France, Kenya, Mexico, USA, Vietnam 2003 Regional projects: Latin America, Europe … Accomplishments: www.ipums.org/international 54th ISI, Berlin 2003
& documentation Preserve data UN Demographic Center for Latin America (CELADE, Santiago, Chile)3000+ microdata tapes preserved www.ipums.org/international 54th ISI, Berlin 2003
Integration projects: 40 Partners + 1 (Table 1: August 16, 2003 ) www.ipums.org/international 54th ISI, Berlin 2003
Data Access: first release May 2002 7 countries, 23 samples~60 million person records USA 1960, 1970, 1980, 1990, 2000 China 1982 Colombia 1964, 1973, 1985, 1993 France 1962, 1968, 1975, 1982, 1990 Kenya 1989, 1999 Mexico 1960, 1970, 1990, 2000 Vietnam 1989, 1999 www.ipums.org/international 54th ISI, Berlin 2003
2. Confidentiality and privacy protections www.ipums.org/international 54th ISI, Berlin 2003
Confidentiality and privacy protections: changing perceptions • Growing recognition that anonymized census microdata samples do not violate national legislation on statistical confidentiality and privacy • International Monetary Fund’s General Data Dissemination System: 52 countries with uniform standards: • All enforce strict standards of statistical confidentiality • Prohibit disclosure of information which may identify individuals or entities • In 2000, 37of 52 countries disseminate anonymized census microdata samples www.ipums.org/international 54th ISI, Berlin 2003
Confidentiality protections, IPUMSI: legal, administrative, and technical • Dissemination agreement between University of Minnesota and each National Statistical Institute • Uniform 10 point protocol: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, and arbitration • Conditional use license between the University of Minnesota and each researcher • Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to conditions of use • Technical data protection measures • Specific to each country …/ www.ipums.org/international 54th ISI, Berlin 2003
Confidentiality protections, IPUMSI: technical • Technical data protection measures • Adopt sample size according to national norms • Suppress detailed geography • Top and bottom code continuous variables • Suppress dates: (birth, migration, marriage, etc.) • “Swap” (recode) place of enumeration for a small fraction of households • Randomly order households within administrative units • No semi-automatic procedures (e.g., μ-Argus) www.ipums.org/international 54th ISI, Berlin 2003
Only serious researchers need apply (Table 2) www.ipums.org/international 54th ISI, Berlin 2003
3. Data enhancements & integration www.ipums.org/international 54th ISI, Berlin 2003
Data Enhancements: • Data quality and enhancements: added value • Clean data to eliminate duplicate records • Conduct internal consistency checks • Impute missing, inconsistent values • Constructed variables to facilitate analysis • Pointer variables for Mothers, Fathers, Spouses • Family and household variables www.ipums.org/international 54th ISI, Berlin 2003
Integration (not standardization): • Adopt uniform coding schemes, nomenclatures and classifications • United Nations Statistics Division (Priniciples & Recs) • UNESCO (ISCED) • International Labor Office (ISCO-88) • Composite coding scheme; 2 simple, but seemingly contradictory rules (Table 3, next slide): • Retain original detail • Harmonize each digit …/ www.ipums.org/international 54th ISI, Berlin 2003
Composite coding scheme: Employment Status www.ipums.org/international 54th ISI, Berlin 2003
Integration Work Plan: • Assemble microdata and documentation (MPC, NSI) • Develop samples to minimize confidentiality risk and maximize robustness (MPC or NSI partner) • Design national integration plan (NSI, consultants)census-by-censusconcept-by-conceptcode-by-code • Write integrated documentation (MPC, partners) • Program integration (MPC) www.ipums.org/international 54th ISI, Berlin 2003
Census documentation compiled for Colombian microdata Standard:UN/Eurostat Principles & Recs... Photos from Colombia integration project, February-March, 2000:4 experts from DANE (census office)+7 academics (3 universities) www.ipums.org/international 54th ISI, Berlin 2003
4. Access www.ipums.org/international 54th ISI, Berlin 2003
Data Access: web-based extraction system • Password protected: to make and retrieve extracts • Researcher selects: • countries, • censuses, • Cases/sub-populations, • variables, and • Sample densities • Extract engine queues request, generates extract • Researcher retrieves extract via web • NO: CDs, original codes, or complete datasets www.ipums.org/international 54th ISI, Berlin 2003
5. Regional initiatives & summary www.ipums.org/international 54th ISI, Berlin 2003
IPUMS-Latin America, 2003-2007: 16 countries, ~500m. people • Scope: Latin Americancensus microdata, 1960-present • Work Plan • 2001-2: Sign licensing agreements with official agencies • 2002-3: Obtain funding from U.S. NIH • 2004: Develop/translate microdata & metadata • : Country expert teams design national integrations • 2005: MPC/expert teams design regional integration • 2006: MPC integrates microdata and metadata • 2007: MPC disseminates to bona fide researchers who show need and agree to conditions of use. www.ipums.org/international 54th ISI, Berlin 2003
ICM-Europe: 14+ national teams www.ipums.org/international 54th ISI, Berlin 2003
Summary, 6 strengths and an aspiration Uniform legal authorization Access restricted to scientists with need Experienced integration teams Proven web-based distribution system High user satisfaction Sustainability: MPC, ICPSR, WHO Aspiration: 90 countries, 90% world’s population by 2010… www.ipums.org/international 54th ISI, Berlin 2003
additional information at:http://www.ipums.org/international* * * * * *Contact: Robert McCaarmccaa@umn.edu www.ipums.org/international 54th ISI, Berlin 2003