250 likes | 414 Views
Documenting and disseminating census and survey data sets. Ilpo Survo, United Nations ESCAP, Bangkok, survo.unescap@un.org for UNECE Training Workshop on Census Technology for SPECA member countries, Astana, 7-8 June 2007. Content. A. Systematic documenting of census data sets
E N D
Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, survo.unescap@un.org for UNECE Training Workshop on Census Technology for SPECA member countries, Astana, 7-8 June 2007
Content A. Systematic documenting of census data sets B. Why to disseminate microdata? C. Microdata Management Toolkit
A good census dataset is.. • Documented clearly • Contains no surprises • Allows users to • Start working effectively quickly • Find the data they are interested in • Understand what the data are measuring and how the data have been created • Assess the quality of the data
Evolving documentation technology • Own documentation standards => International metadata standards • National practices => International good practices. • Ad hoc tools => Structuring tools, databases • Text-based codebooks => XML-based codebooks
Maintain metadata in a centralised database • Manage definitions, methodology information, variable information, data collection information in one place • Ensures consistency across data holdings • Approach useful for planning, data collection, processing, analysis and dissemination
Good practices in data documentation • Explanatory material • Minimum material required to ensure the long-term viability and functionality of a dataset • Contextual information • Material about the context in which the data was collected, and how it was put to use • Enables the secondary user to fully understand the background and processes behind the data collection exercise. • Cataloguing material • Bibliographic record of the dataset, for proper acknowledgement and citation • Basic instrument used for resource discovery • http://www.esds.ac.uk/news/goodPractice.pdf
Untapped potential of microdata for national development • Even the best planned tabulations cannot exhaustively bring out all valuable information from census data • Diversity, disparities and related causalities are best analysed from microdata, e.g. • Tracking the effects of policy interventions on target groups • Determining dimensions of within-country disparities • The quality of research would improve => Return on data collection would increase => National policies could be targeted better => More efficient use of public resources
Factors that might hinder microdata dissemination - Discussion • Concerns about data confidentiality • Ambiguous or missing national legislation • Narrow mandate of statistical agency • Concerns about data quality • Low demand from data users
International initiatives • Marrakech Action Plan on Statistics, http://www.surveynetwork.org/home/docs/Marrakech_Action_Plan_for_Statistics.pdf • International Household Survey Network, http://www.surveynetwork.org/ • IHSN Microdata Management Toolkit • ESCAP-World Bank-PARIS21 project on improving access to survey microdata in Asia and the Pacific
ESCAP project on improving access to survey microdata in Asia and the Pacific, 2007-2008 • Household surveys and population and housing censuses, not establishment surveys • Assessment of status of microdata dissemination • Regional inventory and data archive of household surveys • Regional advocacy and training workshops • On-site training and technical advice on documentation and anonymization
Microdata Management Toolkit – Summary A set of software tools for the documentation, archiving, dissemination and preservation of microdata 1. Metadata Editor • Document survey data in accordance with international standards 2. CD-Rom Builder • Generates user-friendly outputs, such as CDs, websites, for dissemination and archiving 3. The Explorer • For viewing metadata • For re-exporting data to various formats
Download and use • The Toolkit can be downloaded from http://www.surveynetwork.org/home/?lvl1=tools&lvl2=documentation&lvl3=toolkit • Except Metadata Editor, all Toolkit components are available for free • Nesstar Editor: One free license for NSOs of the World bank IDA countries (e.g. Afghanistan, Georgia, Kyrgyz Republic, Moldova, Tajikistan)
Metadata Editor • Documents survey data in accordance with international standards • Data Documentation Initiative (DDI) • Dublin Core Metadata Initiative (DCMI) • Data & metadata in one single file • Data can be imported from various formats, incl. statistical packages • Produces survey documentation in PDF format
Extensible Mark-up Language (XML) • Language to describe data using tags • Tags conceptually the same as fields in databases • XML files are regular text files • Can be edited with text editors • XML files, like databases, can be: • Searched and queried • Edited • Tutorial: http://w3schools.com/xml
XML example <titl> Multiple Indicator Cluster Survey 2005 </titl> <altTitl> MICS </altTitl> <AuthEnty> National Statistics Office (NSO) </AuthEnty> <fundAg abbr= "UNICEF">United Nations Children Fund </fundAg> <collDate date= "2005-01" event="start"/> <collDate date= "2005-03" event="end"/> <nation> Popstan </nation> <geogCover> National </geogCover> <sampProc> 5,000 households, stratified two stages </sampProc> <respRate> 98 percent </respRate>
XML advantages • Creation of a comprehensive checklist of useful metadata elements • Potential to assess the content of a file by determining whether particular tags are, or are not, within that file • Creation of a dataset catalogue which can be queried for key metadata elements • Potential to transform the file into more user-friendly formats, such as HTML, PDF • XML files can be exchanged across networks or over the Internet using web services or SOAP
CD-ROM Builder • Integrates with Metadata Editor • Generates user-friendly outputs (CD-Rom, website) for dissemination and archiving (HTML format) • Allows customization • Branding: look and feel of CD or website • Content: single or multiple surveys
CD-ROM Builder process 1 Create new CD-ROM Project • Selecting a consisting survey by opening the DDI-XML or Nesstar file • The survey branding determines the overall look and feel of the CD • The survey type determines the default metadata content 2 Add a survey to the project and select its type and branding 3 Click the Save button to generate the HTML interface 4 After a few minutes, your CD Project is ready for publishing!
Demonstration of Metadata Editor A live demonstration with Popstan dataset, on-screen in English and Russian