E N D
Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond)* * *Robert McCaa and Albert EsteveMinnesota Population Center and Centre d’EstudisDemogràficsrmccaa@umn.edu; aesteve@ced.uab.eswww.ipums.org/international (Global)www.iecm-project.org (Europe portal)“Only used statistics are useful statistics.”-- Joint UNECE/Eurostat Meeting on Population and Housing Censuses inf.1
3 goals of presentation:IPUMS/IECM census microdata projects • Discuss dissemination statisticsfrom 59,170 extracts downloaded by IPUMS registered users • Invite 21 European partnersto entrust 2010 round samples as expeditiously as possible • Invite non-partnersto entrust samples of historical censuses (2000 and earlier rounds) as well as for the 2010 round
Outline: Integrating census samples and metadata for timely dissemination via the IPUMS-International and IECM initiatives, 2010-2014 no. of slides • IPUMS-International: massive, global dissemination 7 • IPUMS-International: usage statistics 9 • Conclusion 2
1. IPUMS-International: Massive, Global Integration and Dissemination“…best practice for a data repository of international statistical data”--Dennis Trewinchair UNECE task force on Statistical Confidentiality & Microdata Access • See also: • 2006: "IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted access census microdata extracts to academic users," Monographs of official statistics: Work session on statistical data confidentiality. • 2009: Entrusting census microdata and metadata for timely integration and dissemination via the IPUMS-EurAsia and IECM initiatives, 2010-2014. ECE/CES/GE.41/2009/23
IPUMS-International: • Begun in 1999, IPUMS-International is the world’s largest integrated demographic database: • 159 integrated, anonymized census samples (55 countries) • 325 million person records; 3,600 approved researchers • Database is likely to double over the next five years, by the addition of: • 2010 round samples of 17 current Eur-Asian partners: Armenia, Austria, Belarus, Canada, France, Greece, Hungary, Italy, Kyrgyzstan, Netherlands, Portugal, Romania, Slovenia, Spain, Switzerland, UK, USA, etc. • Samples for 8 Eur-Asian countries currently in development: Belgium, Czech Republic, Ireland, Germany, Poland, Turkey, Turkmenistan, Ukraine • Future partners? Albania? Bulgaria? Croatia? Estonia? Finland? …
59,170 extracts—586,643 variables—disseminatedjumped 10% in June, with the 2010 launch • IPUMS-International NEVER disseminates source microdata! • 4 IPUMS constructed variables ranked in the top 30 • Spouse’s location in household • Mother’s location in household • Father’s location in household • Spouse rule for inferring location in household • These variables are constructed from household samples • 3 countries with person samples are invited to construct household samples: • Canada • Netherlands • UK
IPUMS-International darkgreen = integrated and disseminating(55 countries, 159 censuses, 325 millonperson records)green = to be integrated (35 countries, 90 censuses, 150 mill.) IPUMS-International 2011:Cambodia 2008Egypt 2006 France 2006GermanyIndonesiaIrelandetc.2012:why not yours? Mollweide projection
2011 launch at the 58th Session ISI: Dublin, Aug 21-26, 2011http://www.isi2011.ie • European samples to be launched • France, 2006 • Germany (1970-87; DFR ‘71, ‘81) • Ireland (1971-2006) • BeyondEurope, samplesfor: • Cambodia 2008 • Egypt 2006 • Jamaica, 1981-2001 • Iran 2006 • Etc. • Successiveannuallaunchesplannedfor 2012, 2013, 2014.
Dissemination of microdata extracts viaIPUMS-International • IPUMS-International NEVER disseminates source microdata! • Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentiality • IPUMS disseminates extracts, custom-tailored to researchers needs • Unlike most statistical agencies which disseminates an identical entire sample to every user
Dissemination of microdata and metadata extracts • The massive scale of IPUMS requires users to be selective: • Select country (or countries) • Select samples (census years) • Select variables (e.g., age, sex, educational attainment, etc.) • Select sub-populations (e.g., nurses) • Select sample density • Once an extract request is submitted, the IPUMS extract engine: • Constructs the microdata extract • Constructs the metadata • Emails the researcher to retrieve the extract password protected, transmission is encrypted 128 bit SSL • The researcher downloads the extract, un-zips and analyzes • Extract system validated as usage has soared
2. IPUMS-InternationalUsage statistics See card hand-out for list of current samples and usage statistics
Usage Statistics (June 4, 2010) • 59,170 extracts (jumped 10% in June) • Average: 1,000 extracts per country • Smallest number of extracts: Kyrgyz Republic, 116 census of 1999; first year of availability • Largest number of extracts: Mexico, 7,637 6 censuses, 8 years of availabilityMexico 2000: 2,464 extracts • Usage statistics by country: see Table 2
And: scholar.google.com IPUMS & name of country, subject, etc.
Minimum Standards for Samples Entrusted to IPUMS for dissemination • Household samples only • High precision: 5% minimum, 10% preferred • Broad set of variables—omit only those required for statistical confidentiality (low-level geography, low frequency attributes) • Detailed codes • Age: single year to 85 • Occupation, industry: 3 digit ISCO, ISIC • Country of birth: detail individual countries consistent with statistical confidentiality • Thanks to INSEE France for sample of recensement renovee, 2004-2008: 20 million person records to be launched next year.
Conclusion: Invitation to continued cooperation • In 1999, our dream: integrate samples of 21 countries in 10 years • Thanks to generous cooperation of 55 National Statistical Offices • Undreamed technological innovations • By 2009, integrated samples for 44 countries • Number of users and usage far exceeded expectations • For the 2010 decade, our dream: • Double the number of users • Double the number of integrated samples • Re-draw samples that do not meet minimum standards, where feasible • Participating statistical agencies: please entrust 2010 samples in due course • Other statistical agencies: entrust series of samples for each census for which microdata exist
…and to the 58th Session ISI: Dublin, Aug 21-26, 2011http://www.isi2011.ie • IPUMS Workshop, Aug 19-20 • New IPUMS initiatives • Reports by IPUMS users • Reports by National Statistical Office-partners • IPUMS sponsorshipfordelegatesfromparticipatingcountries: • economy air, • registrationfees, • 8 nightsaccomodations and modest per-diem • Simultaneousinterpretation: Russian/French/English
Thank you for your cooperation!!rmccaa@umn.eduaepalos@ced.uab.eswww.ipums.org/internationalwww.iecm-project.org