1 / 57

The Microdata Revolution in Comparative Social Science Research

Explore the transformative impact of the Microdata Revolution in preserving, confidentializing, and disseminating census microdata from over 100 countries since the 1960s. Discover the integration and availability of data for global research.

stefanie
Download Presentation

The Microdata Revolution in Comparative Social Science Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Microdata Revolution and IPUMS-International:Building a secure resource for comparative social science research in time and space* * *Robert McCaaFor additional details, please see:https://international.ipums.org/internationalFor a copy of this presentation, Trewin report, & more:www.hist.umn.edu/~rmccaa/ipums-global4th Conference for Social and Economic Data (RatSWD)Wiesbaden, 19-20 June, 2008

  2. A note of thanks • Organizers of RatSWD-2008 • Mr. Walter Radermacher and Mr. Johann Hahlen, former President FSO • Dr. Gert Wagner • Mr. Markus Zwick • Ms. Andrea Harausz • Mr. Martin Podehl (Statistics Canada, ret.)—translated German census documentation for IPUMS!!! • Dr. James Vaupel, Max Planck Institute (Rostock) • Empirical tradition of German scholarship • From Alexander von Humboldt to Leopoldo von Ranke and beyond

  3. Our common fate on a crowded planet: new forms of global cooperation are required.We must engage interdisciplinary research combining theory and practice.--Jeffrey D. Sachs, Common Wealth (Penguin 2008) Imagine!!! a microdata revolution!

  4. Free!!! A Microdata Revolution Preserve all microdata and documentation 15 slides Confidentialize 21 Integrate 6 Disseminate to researchers world-wide 3 Conclusion: strengths, challenges, 7 golden rules 3

  5. A Census Microdata Revolution • Preserve all census microdata and documentation • 1960s – present • ~100 countries (80 have endorsed MoU) • ~400 censuses (214 are entrusted to IPUMS) • Confidentialize: legal, administrative, technical • Integrate: both microdata and metadata • Disseminate to researchers world-wide— “extracts” of database: countries, censuses, sub-populations, sample size, variables

  6. IPUMS-International Today dark green = already integrated:35 countries, 111 censuses, 263 million person recordsgreen = to be integrated: 39 countries, 103 censuses, 150 mill. Mollweide projection

  7. IPUMS dissemination calendar (see handout)samples for 35 countries available now, 74 soon • Europe • Available (10): Austria, Belarus, France, Greece, Hungary, Netherlands, Portugal, Romania, Spain, UK • Soon (4): Germany, Czech Republic, Slovenia, Switzerland • Americas (funding renewed July 1) • Available (11): Argentina, Brazil, Canada, Chile, Colombia, Costa Rica, Ecuador, Mexico, Panama, USA, Venezuela • Soon (11): Bolivia, Cuba, Dominican Republic, El Salvador, Guatemala, Honduras, Nicaragua, Paraguay, Peru, Puerto Rico, Uruguay • Africa • Available (6): Egypt, Ghana, Kenya, Rwanda, South Africa, Uganda • Soon (11): Botswana, Ethiopia, Guinea (Conakry), Madagascar, Malawi, Mali, Mauritius, Sierra Leone, Sudan, Tanzania, Zambia • Asia • Available (8): Cambodia, China, Iraq, Israel, Malaysia, Palestine, Philippines, Vietnam • Soon (13): Armenia, Bangladesh, Fiji, India, Indonesia, Jordan, Kyrgyz Republic, Mongolia, Nepal, Pakistan, Thailand, Turkmenistan

  8. IPUMS timeline • 1995: IPUMS-USA first release of integrated microdata IPUMS-USA continues: 1850-2000 + ACS samples • 1999: IPUMS-International funded • 2002 - 1st International release: 7 countries, including Colombia and Mexico • 2006: 20 countries, 63 censuses • 2008: 35 countries, 111 censuses • ~263 million person records • Two thousand users • 2013: ~70 countries, ~200 censuses • 214 sets of microdata are already entrusted to MPC • Coming: Germany (8), Switzerland (4), Bangladesh (2), Cuba (1)...

  9. The IPUMS team (Feb. 2008) Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center (Not present: computer gurus, some researchers, and others who were too busy for a photo!)

  10. 1. Preserve (Archive)IPUMS Global workshop, ISI (Lisbon, Aug 2007)

  11. Preservation: 1973 census tapes of Sudan were at risk!

  12. Microdataon this tape were recovered!! Data recovery. Example: Bangladesh Bureau of Statistics--1981 census, 276 tapes, recovery in Aug. ‘08) >3,000 tapes recovered: 1971 Germany1980 Mexico, Mali 76, Sudan 73and many more

  13. Census Microdata: 1950sfew countries archived microdata (a country in green indicates microdata exist for the decade)see: www.hist.umn.edu/~rmccaa/IUMSI/country6.htm Mollweide projection

  14. Census Microdata: 1960sThe Americas: in the vanguard for preservation of microdata Mollweide projection

  15. Census Microdata: 1970sthe preservation of microdata was almost universal in the Americasand was becoming widespread in Europe, Africa and Asia Germany: Thanks to RDC, microdata and metadata for the 1970 FRG and 1971 GDR censuses are recovered. Mollweide projection

  16. Census Microdata: 1980sThe preservation of microdata became generalized Germany: 1981 GDR and 1987 FRG microdata and metadata are recovered. Mollweide projection

  17. See tomorrow’s presentation by Wendy Thomas for more on Archiving Census Microdata: 1990smany countries preserved microdata(or are disposed to recover them) Mollweide projection

  18. Census Microdata: 2000smany countries have microdata(or are disposed to make them available for research) Mollweide projection

  19. Inventory of census microdata archived by region and decade (% of censuses conducted) • Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: http://www.hist.umn.edu/~rmccaa/IPUMS/country6.htm

  20. Microdata Documentation for Germany by census year and type—entrusted to IPUMS. Integrated samples to be launched for the 5th RatSWD? See tomorrow’s presentation by Andrea Harausz for more on Germancontribution to IPUMS

  21. 2. Confidentialize: The “trusted-user/trusted-institution” approach to disseminating integrated, anonymized extracts of census samples

  22. Imagine!!! What’s the problem? Confidentializing an integrated microdata base with: • 200+ census samples of households (70+ countries) • Containing ~½ billion person records with thousands of variables • Available free of cost to tens of thousands of licensed researchers regardless of country of birth, citizenship, residence or place of work • Without a single allegation of violation of privacy or statistical confidentiality… Ever!!

  23. Solution: a restricted-access, web-based system • Password protected: to make extracts and retrieve microdata • Licensed researcher selects: • Countries, • Censuses, • Cases/sub-populations, • Variables, and • Sample densities • Extract engine queues request, generates extract • Researcher retrieves extract via web with SSL 128-bit encryption and analyzes using own wares (soft/hard/wet) • NO CDs. NO source files. NO complete datasets.

  24. 4 points on IPUMSStatistical Confidentiality, Privacy and Security Memorandum of Understanding between University of Minnesota and each National Statistical Office License agreement between each Researcher and the University of Minnesota Technical protections applied to the microdata Why these are “good practices” (UN-ECE) and “best practices” (Dennis Trewin on-site inspection)

  25. A. NSI with U of Minnesota

  26. A. NSI with U. of Minnesota(2005+)

  27. IPUMSi LICENSE B. License with researchersRestricted Access web-based system Legally-binding license agreement • forces would-be snoopers to violate law by which they can be fined and jailed • protects privacy and confidentiality • assures proper use Access limited to: • Bona-fide researchers (credentials) • With a demonstrated scientific need • who agree to abide by license restrictions • Confidentiality • No redistribution • Safely secured • Alleging that a person has been identified is prohibited

  28. IPUMSi LICENSE B. License with researchersRestricted Access web-based system Legally-binding license agreement • forces would-be snoopers to violate law • protects privacy and confidentiality • assures proper use Access limited to: • Bona-fide researchers (credentialed) • With a demonstrated scientific need • who agree to abide by license restrictions • Confidentiality • No redistribution, no commercial use • Safely secured • Alleging that a person can be or has been identified is illegal

  29. “Apply for Access”

  30. License valid for 1 year, renewable. End of application

  31. IPUMSi C. technical measures(in addition to legal & administrative protections) CONFIDENTIALIZES » Suppress geographical detail» Blur/aggregate sensitive codes» Convert dates to ages (blur key vars.) » Swap cases between districts» Scramble order of records

  32. EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International 1. Restrict access to samples 2. Limit geographical detail 3. Re-code unique categories--top and bottom 4. Sign non-disclosure agreement 5. Prohibit redistribution to third parties 6. Prohibit attempts to identify individuals or of making any claim to that effect 7. Require users to provide copies of publications

  33. EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International 8. Construct age from birthdate, if necessary 9. Do not identify date of birth 10. Do not identify precise place of birth 11. Migration: timing/place not identified in detail 12. Identify place of residence by major civil division (pop>20k, 60k, 100k, 1 million—i.e., national convention) 13. Do sensitivity analysis (not yet) 14. Do confidentiality assessment (not yet)

  34. D. IPUMS auditedcited as “good practice” by UN-ECEReport (2007, Annex 23, pp. 98-103)http://www.unece.org/stats/documents/tfcm.htm

  35. Good practices (UN-ECE report, see annex 23): • High level of confidence and transparency between the researchers (users) and the national statistical institutes • The conditions of use are well defined • Sanctions for mis-use are clearly spelled out • Good use is assured by both juridical and administration mechanisms to prevent violations • Sanctions are imposed not only against those who misuse the data but also against their institutions. • The data are anonymized by highly efficient technical means

  36. Statistical confidentiality and security:see the on-site review by Dennis Trewinwww.hist.umn.edu/~rmccaa/ipums-global(click “Trewin Report”) • “The best practice for an international repository of microdata” • “The security of IPUMS is first class…the standard of the best national statistical offices” • “in full compliance with the principles and recommendations of the ECE”

  37. 3. Integration: Microdata and Metadata

  38. IPUMS integration of metadata and microdata • Comprehensive documentation, including • Data dictionaries and codebooks • Complete original source documentation in the official language: questionnaires, manuals, etc. • All translated to English (from the German--thanks again to Martin Podehl!!) and converted into metadatabase for each census • Integration ≠ standardization • Composite codes (11, 12, 21, 22…) ≠ serial codes (1, 2, 3, …) (see next slide)

  39. IPUMS—Microdata integration method: composite codes (multiple digits)retains not only significant distinctions but also integrates comparable concepts

  40. IPUMS—Microdata integration method: composite codes (multiple digits)retains not only significant distinctions but also integrates comparable concepts Goal of integration coding scheme: Assist each researcher in making informed decisions on comparability—not to attempt to make the one best decision for all researchers.

  41. In addition… • Microdata: new high precision samples not only for contemporary censuses but also for historical ones (before the 90s) • Systematic metadata for all variables • Universes • Definitions • Comparability • Dynamic System—facilitates comparing the wording of questionnaires and instructions for any combination of countries and censuses

  42. IPUMS integrated metadata: Instantly, compare text &/or image of enumeration forms and instructions for any combination of countries and censuses (example: educational attainment)

  43. 4. Dissemination

More Related