1 / 35

Metadata Management and Tools

Metadata Management and Tools. August 1, 2013 Data Curation Course. Outline. General information about metadata Metadata and the data life cycle DDI – a specification for documenting social, behavioral and economic data Exercises. Defining Metadata.

argus
Download Presentation

Metadata Management and Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Management and Tools August 1, 2013Data Curation Course

  2. Outline • General information about metadata • Metadata and the data life cycle • DDI – a specification for documenting social, behavioral and economic data • Exercises

  3. Defining Metadata • Metadata are commonly described as “data about data” • Metadata serve as “bridge” between data producer and data user • Metadata bring data to life, helping user to interpret and understand data

  4. Simple Example Best (Rich, Structured) Better… Bad

  5. Importance of Metadata • John MacInnes, Professor of Sociology, The University of Edinburgh, talks about the issues in using secondary data.* • http://www.youtube.com/watch?v=xlQMVV7VJtA * Video courtesy of MANTRA Research Data Management Training -- http://datalib.edina.ac.uk/mantra/

  6. Concerns About Creating Metadata DataONE Education Module: Metadata. DataONE. Retrieved July 19, 2013

  7. Metadata Types • Types of metadata, by content: * • Descriptive: Intellectual content and contextual information relevant to understanding and interpreting data • Technical: Physical and digital features of a data resource • Structural: Configuration of a resource, connections and relationships among parts, or among related resources *Adapted from Jenn Riley, Seeing Standards: A Visualization of the Metadata Universe

  8. Metadata and the Data Life Cycle • Metadata–driven life cycle: Metadata are created, but also used and reused at every stage of the data life cycle • Ideally, metadata continue to accumulate to provide a complete record of the evolution of a dataset

  9. Metadata and the Data Life Cycle Rich metadata = smooth life cycle, high quality data

  10. Structured Metadata • Enhances the value and usability of metadata • A consistent, predictable metadata structure enables • More effective searches • Automated management and processing • Resource sharing • Interoperability • Standardization leads to greater efficiency

  11. Metadata Standards Examples Dublin Core Data Documentation Initiative (DDI) Ecological Markup Language (EML) Astronomy Visualization Darwin Core FGDC Content Standard for Digital Geospatial Metadata (CSDGM) ISO 19115/19139 Geographic information

  12. Standards Cartoon courtesy of XKCD.com

  13. What is DDI? • A metadata standard of and for the community • Two major development lines • DDI Codebook • DDI Lifecycle • Metadata for both human and machine consumption • Additional specifications: • Controlled vocabularies • RDF vocabularies for use with Linked Data

  14. DDI Background and History • Its development started in the mid-1990s, as a grant-funded effort initiated and organized by ICPSR, with international participation • First version published in February 2000

  15. Background and History Continued • The DDI Alliance was formed in 2003 to support and develop the DDI standard http://www.ddialliance.org/ • Ever-growing number of DDI users; large multinational projects • CESSDA data portal (20 European data archives) • International Household Survey Network – IHSN (developing countries from Africa, Asia, former Soviet Union, and more recently, Latin America)

  16. DDI Members and Projects Worldwide

  17. DDI Specification • The first versions of DDI (1.0 through 2.1) were document- and codebook-centric • Version 3.0 was published in April 2008 to document the data life cycle

  18. RDF Vocabularies for Semantic Web • DDI-RDF Discovery Vocabulary • For publishing metadata about datasets into the Web of Linked Data • Based on DDI Codebook and DDI Lifecycle • XKOS • RDF vocabulary for describing statistical classifications, which is an extension of the popular SKOS vocabulary Publication expected in second half of 2013

  19. DDI of the Future • Robust and persistent data model (for the metadata), with extension possibilities, variety of technical expressions • Complete data life cycle coverage • Broadened focus for new research domains • Simpler specification that is easier to understand and use including better documentation

  20. Benefits of DDI Approach • Rich content (currently over 800 items) • Metadata reuse across the life cycle • Machine-actionability • Data management and curation • Support for longitudinal data and comparison

  21. Metadata Reuse

  22. DDI Alignment with Other Metadata Standards • MARC: DDI-C, DDI-L • Dublin Core: DDI-C, DDI-L • SDMX (Statistical Data and Metadata Exchange):DDI-L • ISO 11179 (Metadata Registries): DDI-L • FGDC (Digital Geospatial Metadata): DDI-L • ISO 19115 (Geographic Information Metadata): DDI-L • PREMIS (Preservation Metadata), METS (Metadata Encoding and Transmission): under consideration

  23. DDI-L or DDI-C? • DDI-L • Complex data (hierarchical, longitudinal, comparative) • Metadata-driven survey design (building questionnaires) • Multiple languages • Detailed geographic information • Metadata reuse across the data life cycle • Reusable resources: question/concept/variable banks, registries of organizations and individuals, etc.

  24. DDI-L or DDI-C? • DDI-C • Documentation of simple, survey-type data • Catalog records, involving mainly study-level descriptions (most new features in DDI-L relate to documenting data at item/variable level) • Both DDI-C and DDI-L may be used within the same organization • ICPSR uses DDI-C but has translation to DDI-L for study-level records

  25. DDI-C Structure and Contents DDI-C main sections: • Document Description Self-referencing information about the DDI instance at hand. Usually for internal use, not publicly displayed • Study Description General information about the study. Input is usually the introductory part of a codebook, describing the study scope, methodology, topical/temporal coverage, etc. In DDI-C this section also includes data access and availability information • File Description Describes physical characteristics of data file(s) – name, format, structure, dimensions • Data Description Detailed description of each variable, including variable groups if applicable. Special subsection for documenting census-type aggregate data • Other (Study Related) Materials References, or contains materials used in the production of the study or useful in the analysis of the data For complete content and Tag Library see http://www.ddialliance.org/Specification/DDI-Codebook/2.1/DTD/Documentation/DDI2-1-tree.html

  26. Study-level DDI Elements at ICPSR Date(s) of Collection Mode of Collection Universe Sampling Unit of Analysis Response Rates Weighting Information Data Type Extent of Processing Access Conditions/Restrictions Version History • Study ID (Number, DOI) • Title, Alternate Title • Author/Primary Investigator • Bibliographic Citation • Funding Information • Abstract • Keywords/Topic Classification • Series Information • Geographic Coverage • Time Period Covered • Time Method

  27. Study-level DDI at ICPSR • Leveraged in several ways • Data discovery -- Forms basis of Solr/Lucenefaceted search • Repurposing -- Record is reused across ICPSR’s topical archive sites • Interoperating -- Records shared with Data-PASS, ODESI, and CESSDA archives • Study Overview -- Becomes PDF overview bundled with each download Example: www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30103

  28. DDI at ICPSR: Study-level Metadata Editor

  29. DDI at ICPSR: Study-level Metadata Editor

  30. Variable-level DDI elements at ICPSR • Variable name and ID • Variable label • Question text • Descriptive variable text • Category labels and values (responses) • Category statistics (frequencies) • Summary statistics • Variable format • Notes

  31. Variable-level DDI at ICPSR • Variable-level DDI leveraged in several ways • Search -- Permits search of variables within a dataset/series • Search across ICPSR -- Serves as foundation for Social Science Variables Database • Integration with online analysis • Codebook with frequencies -- Enables generation of PDF documentation • Example: http://www.icpsr.umich.edu/icpsrweb/ICPSR/ssvd/studies/30103/datasets/1/variables/Q25

  32. Tools for generating DDI metadata • Nesstar Publisher • DDI-C, study, file, and variable level • Colectica • DDI-L configuration, study and variable level • Both DDI-C and DDI-L compatible (import and export) • Exports DDI and PDF, HTML, RTF documentation (no need to re-convert to presentation formats) • Colectica for Excel

  33. Tools continued • XCONVERT (SDA Berkeley) • DDI-C, variable level: converts SAS, SPSS, or Stata syntax into DDI-XML, without frequencies • StatTransfer (v. 11) • DDI-L, variable level: no frequencies • MQDS tool • Exports Blaise to DDI-L to create study documentation

  34. Tools continued • More DDI tools can be found here: http://www.ddialliance.org/resources/tools

  35. Questions?

More Related