1 / 45

IHSN Microdata Management Toolkit and related standards and good practices

IHSN Microdata Management Toolkit and related standards and good practices. Olivier Dupriez World Bank, Development Data Group Manager, International Household Survey Network (IHSN) Addis Ababa, September 23, 2011. Microdata Management Toolkit. Two main components

agreenberg
Download Presentation

IHSN Microdata Management Toolkit and related standards and good practices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IHSN Microdata Management Toolkit and related standards and good practices Olivier Dupriez World Bank, Development Data Group Manager, International Household Survey Network (IHSN) Addis Ababa, September 23, 2011

  2. Microdata Management Toolkit • Two main components • Metadata Editor: a specialized software for documenting any kind of microdata (surveys, censuses, administrative records) • NAtional Data Archive (NADA): an open source application for cataloguing and dissemination • (CD-Builder for dissemination) • Compliant with the DDI/DCMI (XML) standards (Data Documentation Initiative and Dublin Core)

  3. What are the DDI and DCMI? • XML metadata standards • Standard checklists of what you need to know about a study and its dataset (DDI), and about the related resources (DCMI) • DDI developed by academic data centers • Now used in most countries in the world, and by various software applications (e.g. DevInfo, CsPro) • Two versions of DDI: • Version 2.n (DDI codebook), used by the Toolkit • Version 3.n (DDI life cycle)

  4. What is the DDI? An example “The National Statistics Office (NSO) of Popstan conducted the Multiple Indicators Cluster Survey (MICS) with the financial support of UNICEF. 5,000 households, representing the overall population of the country, were randomly selected to participate in the survey, following a two-stage stratified sampling methodology. 4,900 of these households provided information.” In XML this could look like this: <titl> Multiple Indicator Cluster Survey 2005 </titl> <altTitl> MICS 2005</altTitl> <AuthEnty> National Statistics Office (NSO) </AuthEnty> <fundAgabbr= "UNICEF">United Nations Children Fund </fundAg> <nation> Popstan</nation> <geogCover> National </geogCover> <sampProc> 5,000 households, stratified two stages </sampProc> <respRate> 98 percent </respRate>

  5. Advantage of XML • Can be transformed into many kinds of outputs: • HTML • PDF • Databases • Others • Plain text files  not specific to any operating system or application (“durable” metadata)

  6. Development of the Toolkit • Metadata Editor • By Nesstar Ltd (“Nesstar Publisher”) with IHSN support • Now a freeware • Development benefited from many users’ feedback • Available at www.ihsn.org/toolkit • NADA, (CD-Builder) • By the World Bank / IHSN • Available at www.ihsn.org/nada

  7. The Metadata Editor (demo) Skip demo

  8. Import your data (SPSS, CSPro, Stata, etc)

  9. Display the list of variables (labels preserved), with summary statistics

  10. Can edit variables and value labels; immediately shows missing labels.

  11. View data.

  12. Add required metadata (survey description)

  13. Add variable-level metadata (question, interviewers’ instructions, definitions, derivation/imputation method, universe, etc)

  14. Add “external resources” (= documentation and links to all related materials: questionnaires, manuals, reports, programs, etc.)

  15. Run various metadata quality checks.

  16. Produce a “metadata completeness” diagnostic.

  17. Generate a PDF version of your metadata.

  18. PDF metadata document

  19. Data and metadata are saved in a single file, format “Nesstar”. The format Nesstar is NOT for dissemination! Data can be re-exported to more standard formats: SPSS, ASCII, Stata, etc. ASCII (with data dictionary) is the preferred format for long-term preservation. The DDI provides the data dictionary. The Metadata Editor is a tool for preparing and packaging your data and metadata, not a tool for dissemination !

  20. Export metadata (no data) to XML format (i.e. generate the DDI and the DCMI files).

  21. DDI file is a text file (XML) which looks like this. It contains all metadata, down to variable level ( it provides a detailed data dictionary). This DDI (+DCMI) file is ready to be “transformed”, e.g. by being published in a NADA catalog.

  22. NAtional Data Archive – NADA (demo) Skip demo

  23. An on-line (intranet or internet) searchable catalog

  24. Provides detailed metadata, all automatically taken from the DDI and transformed into HTML

  25. Includes a description of data files and variables

  26. Searchable catalog

  27. Can compare variables within and across surveys

  28. Comparison of two variables

  29. Various options to disseminate microdata: no access / direct access / licensed files / enclave / external repository Link to other related sites/applications (e.g., REDATAM and/or CensusInfo)

  30. Microdata dissemination policy should be published with NADA.

  31. NADA administration: web-based. Easy to add surveys, etc.

  32. Upload files that you want to disseminate (data, questionnaires, reports, etc)

  33. Decide on the access policy (specific to each dataset).

  34. Includes tools for monitoring uses and users.

  35. Provides a tool for building a “citation catalog”.

  36. List of citations will be displayed for each survey.

  37. Example: citations for Ghana GLSS 1991-02 survey.

  38. IHSN Toolkit – Benefits for NSO • Replicability, transparency • Visibility • Credibility • Institutional memory • Knowledge generation (if disseminate microdata)  increase and demonstrate the value of data  more funding • Satisfy a legal requirement in some countries • Participate in Open Data / Data Liberation movement

  39. IHSN and other tools Reports, tables (PDF) Web development tool On-line tabulation (and analysis) tool REDATAM, SuperStar, Nesstar, Tableau, etc Indicators CensusInfo, DevInfo, etc Microdata (n% sample) IHSN Metadata Editor and NADA Metadata IHSN Metadata Editor and NADA Microdata, full, raw and edited versions IHSN Metadata Editor

  40. Guidelines and practices • Guidelines for documenting a dataset using the IHSN Toolkit http://www.ihsn.org/home/index.php?q=tools/documentation

  41. Guidelines and practices • Formulating an access policy and procedures http://www.ihsn.org/home/index.php?q=focus/dissemination-microdata-files-principles-procedures-and-practices

  42. Guidelines and practices • Long term preservation of data and metadata • Based on OAIS “standard” • Complex; useful as a “technical audit manual” http://www.ihsn.org/home/index.php?q=tools/preservation

  43. Guidelines and practices • Country experience: Statistics Canada’s Data Liberation Initiative (forthcoming) • Other IHSN manuals (being drafted): • Producing public use census sample files • Anonymizing microdata

  44. Some recommendations • Countries • Comply with the DDI standard • Produce sample dataset (n%) for public (free) dissemination of microdata • Publish a formal microdata management and dissemination policy • Assess your preservation policy/procedures • Preserve all versions of your census data • International agencies • Develop a central census catalog (UNSD?) • Develop anonymization guidelines • Support the establishment of data archives

  45. Questions? Need support? • Accelerated Data Program (PARIS21/WB) • Training, technical support to data archiving • Contacts: • Olivier Dupriez at the World Bank (odupriez@worldbank.org) • Francois Fonteneau at PARIS21 (francois.fonteneau@oecd.org)

More Related