1 / 36

IPUMS-International: Integrated Microdata for Trusted Researchers

IPUMS-International is the largest provider of integrated microdata, offering access to census data from over 44 countries and 130 censuses. This case study explores their statistical confidentiality and privacy methods.

gsprouse
Download Presentation

IPUMS-International: Integrated Microdata for Trusted Researchers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical confidentiality and privacy.2. Case study: IPUMS-International www.ipums.org/international* * *Robert McCaaMinnesota Population Centerrmccaa@umn.edu “Inadequate use of microdata has high costs”--Len Cook (2003, registrar general, ONS)

  2. MPC: largest provider of integrated microdata to trusted, non-commercial researchers International(census) History(19th c.) USA (census) GIS Employment Health Time-Use

  3. IPUMS-Global (first 10 years) dark green = integrated and disseminating(44 countries, 130 censuses, 279 millon person records)green = to be integrated (35 countries, 90 censuses, 150 mill.) See “Inventory” handout Inventory: * = IPUMS confidentiality protocols used Mollweide projection

  4. Outline: IPUMS statistical confidentiality methods • IPUMS: A restricted access, web-based microdata disseminationsystem • IPUMS: The trusted user/institution approach • A. Legal Disclosure Controls • B. Administrative Disclosure Controls • C. Technical Disclosure Controls • Example: Saint Lucia, 1991 • IPUMS Assessments (2007): • UN-ECE Case Study • Trewin on-site evaluation

  5. 1. IPUMS-International: Goals • Inventory census microdata and documentation, world-wide • Recover and preserve at-risk microdata • Integrate census microdata and documentation • Disseminate--without cost--extracts of samples to bona-fide researchers worldwide, regardless of country of birth, citizenship or residence. • Sustained funding 1999-2015—6 grants of 5 years duration: • National Science Foundation (USA): 3 successive grants • National Institutes of Health (USA): Latin America, Europe, Eur-Asia

  6. IPUMS-International: a restricted-access, web-based microdata extraction system • Researcher licensed to access microdata: 1/3 rejected • NO: Public access, source files, or complete datasets • Licensed researcher selects: • Countries, • Censuses, • Cases/sub-populations, • Variables, and sample densities • Extract engine queues request, generates extract • Password protected: to make and retrieve extracts • Researcher retrieves extract via web with SSL 128-bit encryption and analyzes using own wares (soft/hard/wet)

  7. 2a. Study documentation2b. Design extract 3. Receive email; logon with p/word 1. Logon w/ password (also SAS, STATA) 4. Download extract (SSL encrypted) 5. UnZip data 6. Analyze See “10 tips” handout 6 stepsusingwww.ipums.org/international:

  8. IPUMS-International: world’s largest disseminator of integrated microdata to trusted, non-commercial researchers • 1999: Founded by Steven Ruggles and Bob McCaa, –restrict access to trusted users, and apply corresponding confidentiality techniques • 2002: 1st release of integrated samples for 7 countries; >200 users in first year • Big success! 80+ countries signed; 70+ entrusted microdata to IPUMS, datasets for more than 250 censuses, >180 entire datasets • 2006…

  9. IPUMS-International: world’s largest disseminator of integrated microdata to trusted, non-commercial researchers • 1999: Founded • 2006, 3rd release: • data for 20 countries, samples for 63 censuses, • 185 million person records, • >1,000 users • 2010, 7th release: • data for ~50 countries, samples for ~160 censuses • ~300 million person records • >4,000 users • Note: data extracts are provided only to licensed users.

  10. 2. IPUMS-International The “trusted-user/institution” approach to disseminating integrated, anonymized microdata extracts Disclosure Controls:A. Legal: Memorandum with NSIB. Administrative: License with researchersC. Technical: Sample, Data modifications

  11. 3 kinds of confidentiality protections: • Legal: Dissemination agreement between University of Minnesota and each National Statistical Institute • Uniform 11 point Memorandum of Understanding regarding: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and order of precedence • Administrative: conditional use license between the University of Minnesota and each researcher • Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to abide by conditions of use license • Technical data protection measures • Specific to each country …/

  12. A. NSI with U of Minnesota

  13. A. NSI with U. of Minnesota

  14. 3 kinds of confidentiality protections: • Legal: Dissemination agreement between University of Minnesota and each National Statistical Institute • Uniform 11 point Memorandum of Understanding regarding: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and order of precedence • Administrative: conditional use license between the University of Minnesota and each researcher • Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to abide by conditions of use license • Technical data protection measures • Specific to each country …/

  15. IPUMSi LICENSE B. License with researchersRestricted Access web-based system Legally-binding license agreement • forces would-be intruder to violate law by which they can be fined and/or jailed • Researcher’s institution sanctioned • protects privacy and confidentiality • assures proper use Access limited to: • Bona-fide researchers (credentials) • With a demonstrated scientific need • who agree to abide by license restrictions • Confidentiality • No redistribution • Safely secured • Alleging that a person has been identified is prohibited

  16. IPUMSi LICENSE B. License with researchersRestricted Access web-based system Legally-binding license agreement • forces would-be snoopers to violate law • protects privacy and confidentiality • assures proper use Access limited to: • Bona-fide researchers (credentialed) • with demonstrated scientific need • who agree to abide by license restrictions • Confidentiality • No redistribution, no commercial use • Data safely secured • Alleging that a person can be or has been identified is a violation

  17. “Apply for Access”

  18. Must click acceptance of each restriction to gain access.

  19. License is for 1 year, renewable. End of application

  20. C. 9 Technical Disclosure Controls(Thorogood, 1999) • Restrict access to samples • Limit geographical detail • Recode sparse categories • Truncate top and bottom codes • Construct age from birthdate, if necessary • Suppress: date of birth, precise place of birth • Migration: timing/place not identified in detail • Identify place of residence by major civil division (pop>20k, 60k, 100k, 250k, 1 million—i.e., national convention) • Suppress any sensitive variable requested by NSI

  21. C. Technical Disclosure ControlsExample: Saint Lucia, 1991 Census • Restrict access to samples: 10% (13,405 persons) • Limit geographical detail (n<2,000): suppress region, district, town, settlement, enumeration district, school identification; retain urban-rural • Recode sparse categories (n<25) “other”. • Type of dwelling: suppress townhouse, barracks • Land occupation: suppress sharecrop • Type of ownership: suppress squatted, leased • Type of roof: suppress 5 categories • Wall material: suppress 5 categories • Water supply: suppress pubwell • Type of lighting: suppress gas • Ethnic origin: suppress Chinese, Portuguese, Syrian-Lebanese • Religion: suppress 6 categories • School, work mode of transport: bicycle • Type of school: technical institute, university • Number of hours worked last wee’k: 5 hour groups. , 70+ • Pay period: suppress quarterly, annually • Occupation, industry, training code: reduce from 4 digits to 1

  22. C. Technical Disclosure ControlsExample: Saint Lucia, 1991 • Top-bottom code • Number of rooms: 10+ • Number of bedrooms: 7+ • Number of radios: 4+ • Number of tvs: 3+ • Number of videos: 2+ • Number of emigrants in dwelling: 2+ • Age: 81+ • Age at first child: <= 14 • Age at first union: <=14, 41+ • Age at last child: <=14, 45+ • Number of school subjects: <=3, >=7 • Income categories: 8+

  23. C. Technical Disclosure ControlsExample: Saint Lucia, 1991 • Suppress: • date of birth, precise place of birth, type of work wanted • Migration: timing/place not identified in detail • Country last lived: suppress 37 categories • Year of immigration: <1948 • Identify place of residence by major civil division (pop>20k, 60k, 100k, 250k, 1 million—i.e., national convention) • all suppressed • Suppress any sensitive variable requested by NSI: • none (as yet)

  24. 3. Assessments:A. Why was IPUMS cited as “good practice” by the UN-ECE (2007, Annex 23, pp. 98-103)?http://www.unece.org/stats/documents/tfcm.htm

  25. UN-ECE Good practices (see annex 23): • High level of confidence and transparency between the researchers (users) and the national statistical institutes • The data are anonymized by highly efficient technical means • The conditions of use are well defined • Good use is assured by both juridical and administrative mechanisms to prevent violations • Sanctions for misuse are clearly spelled out • Sanctions are imposed not only against those who misuse the data but also against their institutions

  26. See “Trewin Report” handout B. The Trewin Report: “The security of the computing environment used by IPUMS-International is first class and appears to be of the standard of the beststatistical offices.”--Dennis Trewin, former-Australian Statistician,past-President International Statistical Institute,chair, UN-ECE Committee on Managing Statistical Confidentiality and Microdata Access (CES 2007)

  27. Statistical confidentiality and security:see the on-site review by Dennis Trewinwww.hist.umn.edu/~rmccaa/ipums-global(click “Trewin Report”) An Outsider’s view from inside IPUMS-International: • “The best practice for an international repository of microdata” • “The security of IPUMS is first class…the standard of the best national statistical offices” • “in full compliance with the principles and recommendations of the ECE”

  28. IPUMS-International strengths • Uniform legal authorization with national statistical authorities • Access restricted to academics with need who agree to abide by stringent confidentiality protections. Sanctions against individual and institution—denial of access to all microdata for the entire institution • Strong technical methods of microdata anonymization • Experienced integration teams • Proven web-based access management system • High producer and user satisfaction • Sustainable: MPC, NSF, NIH

  29. Join us at the 58th ISI: Dublin, Aug 21-26, 2011http://www.isi2001.ie • IPUMS Workshop, Aug 19-20. • Microdata sessions. • IPUMS Funding for delegates from developing countries. • IPUMS booth • Participate in ISI sessions. • Network with stat offices, international agencies, etc.

  30. Thank you!More:www.hist.umn.edu/~rmccaa/ipums-globalsee: Durban workshop (2009): Microdata recovery, Jamaica reportLisbon workshop (2007):Saint Lucia report* * * * * *Contact: rmccaa@umn.edu this ppt is also available at:ipums-global(See “Port of Spain workshop”)

More Related