250 likes | 267 Views
Learn how to systematically document census data sets, disseminate microdata effectively, and manage metadata using international standards and tools. Understand the importance of data documentation and explore the Microdata Management Toolkit for archiving and sharing survey data.
E N D
Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, survo.unescap@un.org for UNECE Training Workshop on Census Technology for SPECA member countries, Astana, 7-8 June 2007
Content A. Systematic documenting of census data sets B. Why to disseminate microdata? C. Microdata Management Toolkit
A good census dataset is.. • Documented clearly • Contains no surprises • Allows users to • Start working effectively quickly • Find the data they are interested in • Understand what the data are measuring and how the data have been created • Assess the quality of the data
Evolving documentation technology • Own documentation standards => International metadata standards • National practices => International good practices. • Ad hoc tools => Structuring tools, databases • Text-based codebooks => XML-based codebooks
Maintain metadata in a centralised database • Manage definitions, methodology information, variable information, data collection information in one place • Ensures consistency across data holdings • Approach useful for planning, data collection, processing, analysis and dissemination
Good practices in data documentation • Explanatory material • Minimum material required to ensure the long-term viability and functionality of a dataset • Contextual information • Material about the context in which the data was collected, and how it was put to use • Enables the secondary user to fully understand the background and processes behind the data collection exercise. • Cataloguing material • Bibliographic record of the dataset, for proper acknowledgement and citation • Basic instrument used for resource discovery • http://www.esds.ac.uk/news/goodPractice.pdf
Untapped potential of microdata for national development • Even the best planned tabulations cannot exhaustively bring out all valuable information from census data • Diversity, disparities and related causalities are best analysed from microdata, e.g. • Tracking the effects of policy interventions on target groups • Determining dimensions of within-country disparities • The quality of research would improve => Return on data collection would increase => National policies could be targeted better => More efficient use of public resources
Factors that might hinder microdata dissemination - Discussion • Concerns about data confidentiality • Ambiguous or missing national legislation • Narrow mandate of statistical agency • Concerns about data quality • Low demand from data users
International initiatives • Marrakech Action Plan on Statistics, http://www.surveynetwork.org/home/docs/Marrakech_Action_Plan_for_Statistics.pdf • International Household Survey Network, http://www.surveynetwork.org/ • IHSN Microdata Management Toolkit • ESCAP-World Bank-PARIS21 project on improving access to survey microdata in Asia and the Pacific
ESCAP project on improving access to survey microdata in Asia and the Pacific, 2007-2008 • Household surveys and population and housing censuses, not establishment surveys • Assessment of status of microdata dissemination • Regional inventory and data archive of household surveys • Regional advocacy and training workshops • On-site training and technical advice on documentation and anonymization
Microdata Management Toolkit – Summary A set of software tools for the documentation, archiving, dissemination and preservation of microdata 1. Metadata Editor • Document survey data in accordance with international standards 2. CD-Rom Builder • Generates user-friendly outputs, such as CDs, websites, for dissemination and archiving 3. The Explorer • For viewing metadata • For re-exporting data to various formats
Download and use • The Toolkit can be downloaded from http://www.surveynetwork.org/home/?lvl1=tools&lvl2=documentation&lvl3=toolkit • Except Metadata Editor, all Toolkit components are available for free • Nesstar Editor: One free license for NSOs of the World bank IDA countries (e.g. Afghanistan, Georgia, Kyrgyz Republic, Moldova, Tajikistan)
Metadata Editor • Documents survey data in accordance with international standards • Data Documentation Initiative (DDI) • Dublin Core Metadata Initiative (DCMI) • Data & metadata in one single file • Data can be imported from various formats, incl. statistical packages • Produces survey documentation in PDF format
Extensible Mark-up Language (XML) • Language to describe data using tags • Tags conceptually the same as fields in databases • XML files are regular text files • Can be edited with text editors • XML files, like databases, can be: • Searched and queried • Edited • Tutorial: http://w3schools.com/xml
XML example <titl> Multiple Indicator Cluster Survey 2005 </titl> <altTitl> MICS </altTitl> <AuthEnty> National Statistics Office (NSO) </AuthEnty> <fundAg abbr= "UNICEF">United Nations Children Fund </fundAg> <collDate date= "2005-01" event="start"/> <collDate date= "2005-03" event="end"/> <nation> Popstan </nation> <geogCover> National </geogCover> <sampProc> 5,000 households, stratified two stages </sampProc> <respRate> 98 percent </respRate>
XML advantages • Creation of a comprehensive checklist of useful metadata elements • Potential to assess the content of a file by determining whether particular tags are, or are not, within that file • Creation of a dataset catalogue which can be queried for key metadata elements • Potential to transform the file into more user-friendly formats, such as HTML, PDF • XML files can be exchanged across networks or over the Internet using web services or SOAP
CD-ROM Builder • Integrates with Metadata Editor • Generates user-friendly outputs (CD-Rom, website) for dissemination and archiving (HTML format) • Allows customization • Branding: look and feel of CD or website • Content: single or multiple surveys
CD-ROM Builder process 1 Create new CD-ROM Project • Selecting a consisting survey by opening the DDI-XML or Nesstar file • The survey branding determines the overall look and feel of the CD • The survey type determines the default metadata content 2 Add a survey to the project and select its type and branding 3 Click the Save button to generate the HTML interface 4 After a few minutes, your CD Project is ready for publishing!
Demonstration of Metadata Editor A live demonstration with Popstan dataset, on-screen in English and Russian