480 likes | 555 Views
Thanks to CBS-Sudan & NSO-Malawi for 2008 microdata.
E N D
Thanks to CBS-Sudan & NSO-Malawi for 2008 microdata IPUMS & AICMD Add Value to African Census Microdata Robert McCaa and Patricia Kelly-HallASSD VII, January, 2012Cape Town, South Africa* * *ipums.org/internationalecastats.uneca.org/aicmdrmccaa@umn.edufor additional details, please see:www.hist.umn.edu/~rmccaa/ipums-africa
“Dissemination [means] opening up the value inherent in our data” --Walter Radermacher (President, Eurostat) and Pieter Everaers (Director, Eurostat)IPUMS+AICMD open up the value inherent in microdata for censuses throughout Africa. IPUMS-International: Free, Worldwide Microdata Access Now for Censuses of 62 * * *ipums.org/internationalecastats.uneca.org/aicmdrmccaa@umn.edufor additional details, please see:www.hist.umn.edu/~rmccaa/ipums-africa
The purpose of this talk: “value added by 3rd parties” • Encourage National Statistical Offices to entrust census microdata samples to the IPUMS-International project • 1960-2000 samples • 2010 round samples • Describe some of the value that IPUMS-International adds to integrated microdata and metadata. • Free access to the microdata for bona fide researchers • Extensive analysis of data quality before the samples are released • Integrated metadata (compare questions in 1, 2, … many censuses) • Integrated, pooled microdata (multiple censuses, countries) • Encourage usage of integrated samples by African researchers • Usage is relatively low, but increasing quickly as more samples become available
Central Statistics Office-Ireland Deirdre Cullen, Senior Statistician, testimonial (not in the paper): Advantages of IPUMS for Ireland • Bonus for CSO: as a result of this project, our historic data sets are now in a much more usable format • IPUMS allows – mix of Census years available in 1 file • Comparability with other countries • Ease of access for users • Positive publicity for Census in Ireland
Outline • Introduction • When NSOs disseminate microdata, the task is costly, risky and often unsatisfactory • IPUMS+AICMD partnership offers solution for African countries • Invitation to participate, entrust microdata for 2010 and earlier censuses without undue delay • IPUMS+AICMD adds value to population microdata: • Statistical confidentiality and security – disclosure controls, restricted access • Integration – census microdata and metadata • Dissemination – custom tailored extracts: country(ies), census(es), populations, variables, sample density, metadata • Ethics - statistical transparency, academic freedom, responsible use, sharing of results. • Reflections
Why Statistical Offices entrust Responsibility of Disseminating Census Microdata to IPUMS-International • NSO Dissemination is costly, risky and often unsatisfactory • Costly: scarce human resources to prepare sample, assure statistical confidentiality, and manage access for relatively few users (however important they may be!) • Risky: little experience in anonymizing and managing access to microdata, yet great responsibility • US Census Bureau anonymization protocol egregiously corrupted ages for elderly in ACS microdata—took 5 years to discover the error! • Unsatisfactory: excessive anonymization, slow to provide access. Troublesome for NSO statisticians who do not wish to risk their job to some academic. Most deny access to all but the most persistent, influential would-be users. Complaints (of a large European NSO): • “I haven't used the [microdata]; the bureaucracy was just too slow to get much use out of it.” • “[Access] is unbelievably bureaucratic and difficult – this discourages people from using it. It took me 6 months to get the data.”
IPUMS-International assumes responsibilities and risks for integrating & disseminating microdata and metadata • Uniform Memorandum of Understanding with each NSO: • Founding partners (2001): Kenya, South Africa, Ghana, Egypt, France, Spain, China, Vietnam, Kenya, Colombia, Mexico, USA … now almost 100 countries • Specific conditions of access: ownership of data (NSO), use, access, restrictions, confidentiality, security, publication, violations, sharing, jurisdiction, and precedence. • Almost 100 countries entrust census microdata to IPUMS-I. • 6 most populous countries NOT entrusting census microdata to IPUMS: India, *Nigeria, Russian Federation, Japan, Algeria, *Korea (RO—may join at the UNSC in New York) • * = negotiating • No data: Congo (DR), Myanmar, Afghanistan, Uzbekistan, Somalia
90+ National Statistics Offices have endorsed the IPUMS-International Memorandum of Understanding
IPUMS-International results posted at http://bibliography.ipums.org
IPUMS Milestones • 1995: IPUMS-USA first release of integrated microdata • 1999: IPUMS-International funded by NSF & NIH • 2002: 1st International launch: 7 countries, 25 samples. • 2007 launch (56th ISI): 32 89 • 2009 launch (57th ISI): 44 130 • ~279 million person records • ~3,000 registered users • 2011 launch (58th ISI): 62 185 • 397 million person records • 5,000 registered users • 2013 (ISI Hong Kong!): ~70 ~225 • ~500 million person records • ~7,000 registered users
Cartogram of IPUMS+AICMD partners weighted by populationdarkgreen = integrated and disseminating 2002-2011 Open Invitation to Cooperate , Entrust and Access
The IPUMS-International team(includes National Science Foundation Board) Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center (Not present: some computer gurus, researchers, research assistants, civil service employees, and others who were not at the NSF Board meeting)
I. Statistical Confidentiality and Security • Statistical Confidentiality and Microdata Security • Statistical disclosure control protections • Restricted access See, pp. 3-5: 2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII, Cape Town, South Africa, January 2012.. . .
NSI entrusts census metadata and anonymized microdata to MPC MPC integrates metadata and confidentializes microdata samples IPUMS-International MPC …. IPUMS-International manages access and entrusts researchers with custom-tailored <ddi> , SAS, STATA, and SPSS metadata and microdata extracts for any combination of countries, censuses, sub-populations, and variables NSI …62+ NSI 1 …. 1. Statistical Confidentiality and security.Trusted researcher receives customized extracts Trusted researcher Trusted researcher
Dennis Trewin on-site evaluation. former: Australian Statistician, chair: Conference of European Statisticians Task Force on Microdata and Confidentiality • “...the best practice for an international repository of microdata” • “The security of IPUMS is first class…the standard of the best national statistical offices” • “...a valuable and trustworthy microdata service. It meets the fundamental principles of good practice with respect to confidentiality and microdata.” • “in full compliance with the principles and recommendations of the CES [Conference of European Statisticians]”
2. Statistical Disclosure controls • Microdata are anonymized by suppressing any names, addresses, or precise geographic identifiers. • Sample is drawn so that researchers have access to only a minor fraction of the complete dataset. • Disclosure protections are imposed on the sample, variable-by-variable and code-by-code. • A small fraction of households is swapped across geographic boundaries. • See case of Switzerland with 5% household samples for four censuses. • Suppression thresholds are set by each NSO. • Great satisfaction from NSOs and researchers
3. Restricted access: Thwarting intruders by legal and administrative procedures • Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentiality • 1,100 word application form; <5,300 word Facebook policy • Agree to 8 specific conditions of use • Supply extensive personal and institution details • Identify your employer’s Office for Protection of Human Subject, IRB, etc. • Describe research detailing need for access • Rogue intruders face legal and institutional sanctions • University attorney’s office is obligated to initiate sanctions against both individual and the institution—similar to NIH probationary status
Despite the “P” (Public) in IPUMS, access to the microdata is restricted. Restricted Access: User Registration and Login Links to Partner Statistical Agency Websites
Thwarting intruders by legal and administrative procedures • Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentiality • 1,100 word application form; <5,300 word Facebook policy • Agree to 8 specific conditions of use • Supply extensive personal and institution details • Identify your employer’s Office for Protection of Human Subject, IRB, etc. • Describe research detailing need for access • Rogue intruders face legal and institutional sanctions • University attorney’s office is obligated to initiate sanctions against both individual and the institution—similar to NIH probationary status Application form for IPUMS-Irequesting information on institutional affiliation
Conditions of use: must agree to each one--no exceptions √ √ • Data must not be redistributed without authorization. All data extracted from the IPUMS-International database are intended solely for the use of the licensee. Under IPUMS-International agreements with collaborating agencies, redistribution of the data to third parties is prohibited. Each member of a research team using the data must apply for access and be licensed individually. • The microdata are intended only for scholarly research and educational purposes. These microdata are provided for the exclusive purposes of teaching and scholarly research, and may not be used for any other purposes without explicit written approval from the relevant official statistical authority. • Commercial use and redistribution of the microdata is strictly prohibited. Users are prohibited from using microdata acquired from the Integrated Public Use Microdata Series International or other authorized distributors in the pursuit of any commercial or income-generating venture either privately, or otherwise. • Use of the microdata must follow strict rules of confidentiality. Users will maintain the confidentiality of persons and households. Any attempt to ascertain the identity of persons or households from the microdata is prohibited. Alleging that a person or household has been identified in these data is also prohibited. Statistical results that might reveal the identity of persons or entities may not be reported or published in any form. • The microdata must always be safely secured. Users will implement security measures to prevent unauthorized access to microdata acquired from Integrated Public Use Microdata Series International, its partners or authorized distributors. Upon the completion of this research, data may be retained only if they can be safely secured. If security cannot be guaranteed, the microdata must be destroyed. • Scholarly publications are permitted, and must be cited appropriately. The publishing of research results based on IPUMS-International microdata is permitted in communications such as scholarly papers, journals and the like. The authors of these communications are required to cite Integrated Public Use Microdata Series-International and the relevant official statistical authority as the source of the microdata, and to indicate that the results and views expressed are those of the author. Users are requested to provide the IPUMS-International staff with a full citation for any publications resulting from their work with these data. • Any violation of this license agreement will result in disciplinary action, including possible loss of employment. Violation of this agreement will lead to revocation of this license, recall of all microdata acquired, a motion of censure to the relevant professional organization(s) and civil prosecution under national or international statutes, at the discretion of the Regents of the University of Minnesota and the official statistical agencies. Sanctions likewise may be taken against the institution with which the violator is affiliated. • User agrees to notify ipums@pop.umn.edu regarding errors in the data. √ √ √ √ √ √
II. Integration • Comprehensive Source Metadata • Integrated, DDI Compatible Metadata • Integrated Microdata • IPUMS-I Value-Added Variables • Integrated Boundary Files See, pp. 6-8: 2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII, Cape Town, South Africa, January 2012.
4. Comprehensive Source Documents (forms, instruction manuals)--for integrated censuses Links to Official Statistical Agency Partners Bibliography: view cites, link to publications
5. DDI Compatible Metadata (we share!) http://microdata.worldbank.org: Mapped in DDI; compatible with IHSN Microdata toolkitcopies entered into the NADA catalog and archive
User Registration, conditions of use license 6. Integrated Metadata (Browse and Select Data Download Data Extract (and <ddi> codebook) Source documents (forms, instruction manuals) Link to Official Statistical Agency home pages Bibliography: view cites, link to publications
Integrated metadata: open access, dynamically constructed. Example: Marital Status Page is constructed dynamicallyDisplays currently selected samples
IntegratedIPUMS-I Metadata: Codes and FrequenciesDetailed, Case-Count View2 rules: 1. Retain details2. Harmonize everything Page is constructed dynamicallyDisplays currently selected samples
IntegratedIPUMS-I Metadata: Enumeration textView text in English for any combination of countries and censuses. 2 documents: First the form Page is constructed dynamicallyDisplays currently selected samples
IntegratedIPUMS-I Metadata: Enumeration textView text in English for any combination of countries and censuses. 2 documents:First, the form;then, the enumeration instructions scroll down for more Page is constructed dynamicallyDisplays currently selected samples
Appendix D. 42 (of 60) Integrated Household Variables: Availability for 13 African Countries (25 Censuses)
Appendix E. 88 (of 108) Integrated Person Variables: Availability for 13 African Countries (25 Censuses)
8. GIS Boundary files (and other Data Files Source documents (forms, instruction manuals) Link to Official Statistical Agency home pages Bibliography: view cites, link to publications
III. Dissemination • Trans-border Access • Custom-Tailored Extracts • Usage • 2010 Round Census Microdata See, pp. 9-10: 2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII, Cape Town, South Africa, January 2012. .
9. Transborder access. IPUMS-I Extracts by researcher’s place of identity
Dissemination of microdata and metadata extracts • The massive scale of IPUMS requires users to be selective: • Select country (or countries) • Select samples (census years) • Select variables (e.g., age, sex, educational attainment, etc.) • Select sub-populations (e.g., nurses) • Select sample density • Once an extract request is submitted, the IPUMS extract engine: • Constructs the microdata extract • Constructs the metadata • Emails the researcher to retrieve the extract password protected, transmission is encrypted 128 bit SSL • The researcher downloads the extract, un-zips and analyzes • Extract system validated as usage has soared
b-1. Study documentationb-2. Create extract c. Receive email; logon with p/word d-1. Download extract (SSL encrypted) d-2. UnZip data 10. Custom tailored extracts.www.ipums.org/international: e. Analyze using own software a. Login with password
Use the extract system to “Select Cases”. Example: Disability
Second: Click the box to include the variable Third: Click “select cases” box
Fourth: Scroll down, select “disabled”, then “Continue to next step” Click here, to select every person in households containing an individual with employment disability
2010 round censuses. Minimum Standards for Samples Entrusted to IPUMS for dissemination • Household samples • High precision: 5% minimum, 10% preferred • Broad set of variables—omit only those required for statistical confidentiality (low-level geography, low frequency attributes) • Detailed codes • Age: single year to 85 • Occupation, industry: 3 digit ISCO, ISIC • Country of birth: detail individual countries consistent with statistical confidentiality • Thanks to INSEE France for sample of recensementrenovee, 2004-2008: 20 million person records launched in IPUMS-I
IV. Ethics • 13. Statistical Transparency • Academic Freedom • Reduce Research Fraud and Exaggeration of Results • Share Research Results See, pp. 11: 2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII, Cape Town, South Africa, January 2012.
“IPUMS-I is an excellent resource for teaching…” -- Dr. David Lam, president Population Association of America • Free, easy access to data for many countries and censuses • Large sample sizes: • Make it possible to include many different variables in a regression… multi-level model… • Produce separate estimates for population sub-groups • Easy to extract samples with a target sample size (e.g., 50mb) • Easy to revise an extract for a larger size or to include more countries, censuses, variables or sub-populations • Students show a great deal of creativity in using IPUMS-I • Skills acquired have an immediate pay-off when applying for jobs (e.g., World Bank), graduate school, etc.
Africa Mirror Site: http://ecastats.uneca.org/aicmd/ IPUMS shares
“Dissemination [means] opening up the value inherent in our data” --Walter Radermacher (President, Eurostat) and Pieter Everaers (Director, Eurostat)IPUMS opens up the value inherent in census microdata.for the 2010 roundfor the 2000, 1990 and earlier rounds (where microdata exist) And for many countries IPUMS-International: Free, Worldwide Microdata Access Now for Censuses of 62 Countries--80 by 2015 Robert McCaa, Steven Ruggles, Matt Sobek and Wendy L. Thomas Session STS065 The Future of Microdata Access58th International Statistical Institute, Dublin, Ireland, 26 August, 2011* * *ipums.org/internationalecastats.uneca.org/aicmdrmccaa@umn.edufor additional details, please see:www.hist.umn.edu/~rmccaa/ipums-africa
Thank youTo discuss cooperation, please discuss with Dr. Patricia Kelly-Hall or email: rmccaa@umn.eduTo use integrated census microdata, See:ipums.org/international orecastats.uneca.org/aicmd