570 likes | 584 Views
Explore the transformative impact of the Microdata Revolution in preserving, confidentializing, and disseminating census microdata from over 100 countries since the 1960s. Discover the integration and availability of data for global research.
E N D
The Microdata Revolution and IPUMS-International:Building a secure resource for comparative social science research in time and space* * *Robert McCaaFor additional details, please see:https://international.ipums.org/internationalFor a copy of this presentation, Trewin report, & more:www.hist.umn.edu/~rmccaa/ipums-global4th Conference for Social and Economic Data (RatSWD)Wiesbaden, 19-20 June, 2008
A note of thanks • Organizers of RatSWD-2008 • Mr. Walter Radermacher and Mr. Johann Hahlen, former President FSO • Dr. Gert Wagner • Mr. Markus Zwick • Ms. Andrea Harausz • Mr. Martin Podehl (Statistics Canada, ret.)—translated German census documentation for IPUMS!!! • Dr. James Vaupel, Max Planck Institute (Rostock) • Empirical tradition of German scholarship • From Alexander von Humboldt to Leopoldo von Ranke and beyond
Our common fate on a crowded planet: new forms of global cooperation are required.We must engage interdisciplinary research combining theory and practice.--Jeffrey D. Sachs, Common Wealth (Penguin 2008) Imagine!!! a microdata revolution!
Free!!! A Microdata Revolution Preserve all microdata and documentation 15 slides Confidentialize 21 Integrate 6 Disseminate to researchers world-wide 3 Conclusion: strengths, challenges, 7 golden rules 3
A Census Microdata Revolution • Preserve all census microdata and documentation • 1960s – present • ~100 countries (80 have endorsed MoU) • ~400 censuses (214 are entrusted to IPUMS) • Confidentialize: legal, administrative, technical • Integrate: both microdata and metadata • Disseminate to researchers world-wide— “extracts” of database: countries, censuses, sub-populations, sample size, variables
IPUMS-International Today dark green = already integrated:35 countries, 111 censuses, 263 million person recordsgreen = to be integrated: 39 countries, 103 censuses, 150 mill. Mollweide projection
IPUMS dissemination calendar (see handout)samples for 35 countries available now, 74 soon • Europe • Available (10): Austria, Belarus, France, Greece, Hungary, Netherlands, Portugal, Romania, Spain, UK • Soon (4): Germany, Czech Republic, Slovenia, Switzerland • Americas (funding renewed July 1) • Available (11): Argentina, Brazil, Canada, Chile, Colombia, Costa Rica, Ecuador, Mexico, Panama, USA, Venezuela • Soon (11): Bolivia, Cuba, Dominican Republic, El Salvador, Guatemala, Honduras, Nicaragua, Paraguay, Peru, Puerto Rico, Uruguay • Africa • Available (6): Egypt, Ghana, Kenya, Rwanda, South Africa, Uganda • Soon (11): Botswana, Ethiopia, Guinea (Conakry), Madagascar, Malawi, Mali, Mauritius, Sierra Leone, Sudan, Tanzania, Zambia • Asia • Available (8): Cambodia, China, Iraq, Israel, Malaysia, Palestine, Philippines, Vietnam • Soon (13): Armenia, Bangladesh, Fiji, India, Indonesia, Jordan, Kyrgyz Republic, Mongolia, Nepal, Pakistan, Thailand, Turkmenistan
IPUMS timeline • 1995: IPUMS-USA first release of integrated microdata IPUMS-USA continues: 1850-2000 + ACS samples • 1999: IPUMS-International funded • 2002 - 1st International release: 7 countries, including Colombia and Mexico • 2006: 20 countries, 63 censuses • 2008: 35 countries, 111 censuses • ~263 million person records • Two thousand users • 2013: ~70 countries, ~200 censuses • 214 sets of microdata are already entrusted to MPC • Coming: Germany (8), Switzerland (4), Bangladesh (2), Cuba (1)...
The IPUMS team (Feb. 2008) Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center (Not present: computer gurus, some researchers, and others who were too busy for a photo!)
1. Preserve (Archive)IPUMS Global workshop, ISI (Lisbon, Aug 2007)
Microdataon this tape were recovered!! Data recovery. Example: Bangladesh Bureau of Statistics--1981 census, 276 tapes, recovery in Aug. ‘08) >3,000 tapes recovered: 1971 Germany1980 Mexico, Mali 76, Sudan 73and many more
Census Microdata: 1950sfew countries archived microdata (a country in green indicates microdata exist for the decade)see: www.hist.umn.edu/~rmccaa/IUMSI/country6.htm Mollweide projection
Census Microdata: 1960sThe Americas: in the vanguard for preservation of microdata Mollweide projection
Census Microdata: 1970sthe preservation of microdata was almost universal in the Americasand was becoming widespread in Europe, Africa and Asia Germany: Thanks to RDC, microdata and metadata for the 1970 FRG and 1971 GDR censuses are recovered. Mollweide projection
Census Microdata: 1980sThe preservation of microdata became generalized Germany: 1981 GDR and 1987 FRG microdata and metadata are recovered. Mollweide projection
See tomorrow’s presentation by Wendy Thomas for more on Archiving Census Microdata: 1990smany countries preserved microdata(or are disposed to recover them) Mollweide projection
Census Microdata: 2000smany countries have microdata(or are disposed to make them available for research) Mollweide projection
Inventory of census microdata archived by region and decade (% of censuses conducted) • Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: http://www.hist.umn.edu/~rmccaa/IPUMS/country6.htm
Microdata Documentation for Germany by census year and type—entrusted to IPUMS. Integrated samples to be launched for the 5th RatSWD? See tomorrow’s presentation by Andrea Harausz for more on Germancontribution to IPUMS
2. Confidentialize: The “trusted-user/trusted-institution” approach to disseminating integrated, anonymized extracts of census samples
Imagine!!! What’s the problem? Confidentializing an integrated microdata base with: • 200+ census samples of households (70+ countries) • Containing ~½ billion person records with thousands of variables • Available free of cost to tens of thousands of licensed researchers regardless of country of birth, citizenship, residence or place of work • Without a single allegation of violation of privacy or statistical confidentiality… Ever!!
Solution: a restricted-access, web-based system • Password protected: to make extracts and retrieve microdata • Licensed researcher selects: • Countries, • Censuses, • Cases/sub-populations, • Variables, and • Sample densities • Extract engine queues request, generates extract • Researcher retrieves extract via web with SSL 128-bit encryption and analyzes using own wares (soft/hard/wet) • NO CDs. NO source files. NO complete datasets.
4 points on IPUMSStatistical Confidentiality, Privacy and Security Memorandum of Understanding between University of Minnesota and each National Statistical Office License agreement between each Researcher and the University of Minnesota Technical protections applied to the microdata Why these are “good practices” (UN-ECE) and “best practices” (Dennis Trewin on-site inspection)
IPUMSi LICENSE B. License with researchersRestricted Access web-based system Legally-binding license agreement • forces would-be snoopers to violate law by which they can be fined and jailed • protects privacy and confidentiality • assures proper use Access limited to: • Bona-fide researchers (credentials) • With a demonstrated scientific need • who agree to abide by license restrictions • Confidentiality • No redistribution • Safely secured • Alleging that a person has been identified is prohibited
IPUMSi LICENSE B. License with researchersRestricted Access web-based system Legally-binding license agreement • forces would-be snoopers to violate law • protects privacy and confidentiality • assures proper use Access limited to: • Bona-fide researchers (credentialed) • With a demonstrated scientific need • who agree to abide by license restrictions • Confidentiality • No redistribution, no commercial use • Safely secured • Alleging that a person can be or has been identified is illegal
License valid for 1 year, renewable. End of application
IPUMSi C. technical measures(in addition to legal & administrative protections) CONFIDENTIALIZES » Suppress geographical detail» Blur/aggregate sensitive codes» Convert dates to ages (blur key vars.) » Swap cases between districts» Scramble order of records
EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International 1. Restrict access to samples 2. Limit geographical detail 3. Re-code unique categories--top and bottom 4. Sign non-disclosure agreement 5. Prohibit redistribution to third parties 6. Prohibit attempts to identify individuals or of making any claim to that effect 7. Require users to provide copies of publications
EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International 8. Construct age from birthdate, if necessary 9. Do not identify date of birth 10. Do not identify precise place of birth 11. Migration: timing/place not identified in detail 12. Identify place of residence by major civil division (pop>20k, 60k, 100k, 1 million—i.e., national convention) 13. Do sensitivity analysis (not yet) 14. Do confidentiality assessment (not yet)
D. IPUMS auditedcited as “good practice” by UN-ECEReport (2007, Annex 23, pp. 98-103)http://www.unece.org/stats/documents/tfcm.htm
Good practices (UN-ECE report, see annex 23): • High level of confidence and transparency between the researchers (users) and the national statistical institutes • The conditions of use are well defined • Sanctions for mis-use are clearly spelled out • Good use is assured by both juridical and administration mechanisms to prevent violations • Sanctions are imposed not only against those who misuse the data but also against their institutions. • The data are anonymized by highly efficient technical means
Statistical confidentiality and security:see the on-site review by Dennis Trewinwww.hist.umn.edu/~rmccaa/ipums-global(click “Trewin Report”) • “The best practice for an international repository of microdata” • “The security of IPUMS is first class…the standard of the best national statistical offices” • “in full compliance with the principles and recommendations of the ECE”
IPUMS integration of metadata and microdata • Comprehensive documentation, including • Data dictionaries and codebooks • Complete original source documentation in the official language: questionnaires, manuals, etc. • All translated to English (from the German--thanks again to Martin Podehl!!) and converted into metadatabase for each census • Integration ≠ standardization • Composite codes (11, 12, 21, 22…) ≠ serial codes (1, 2, 3, …) (see next slide)
IPUMS—Microdata integration method: composite codes (multiple digits)retains not only significant distinctions but also integrates comparable concepts
IPUMS—Microdata integration method: composite codes (multiple digits)retains not only significant distinctions but also integrates comparable concepts Goal of integration coding scheme: Assist each researcher in making informed decisions on comparability—not to attempt to make the one best decision for all researchers.
In addition… • Microdata: new high precision samples not only for contemporary censuses but also for historical ones (before the 90s) • Systematic metadata for all variables • Universes • Definitions • Comparability • Dynamic System—facilitates comparing the wording of questionnaires and instructions for any combination of countries and censuses
IPUMS integrated metadata: Instantly, compare text &/or image of enumeration forms and instructions for any combination of countries and censuses (example: educational attainment)