370 likes | 633 Views
Metadata Management and Tools. August 1, 2013 Data Curation Course. Outline. General information about metadata Metadata and the data life cycle DDI – a specification for documenting social, behavioral and economic data Exercises. Defining Metadata.
E N D
Metadata Management and Tools August 1, 2013Data Curation Course
Outline • General information about metadata • Metadata and the data life cycle • DDI – a specification for documenting social, behavioral and economic data • Exercises
Defining Metadata • Metadata are commonly described as “data about data” • Metadata serve as “bridge” between data producer and data user • Metadata bring data to life, helping user to interpret and understand data
Simple Example Best (Rich, Structured) Better… Bad
Importance of Metadata • John MacInnes, Professor of Sociology, The University of Edinburgh, talks about the issues in using secondary data.* • http://www.youtube.com/watch?v=xlQMVV7VJtA * Video courtesy of MANTRA Research Data Management Training -- http://datalib.edina.ac.uk/mantra/
Concerns About Creating Metadata DataONE Education Module: Metadata. DataONE. Retrieved July 19, 2013
Metadata Types • Types of metadata, by content: * • Descriptive: Intellectual content and contextual information relevant to understanding and interpreting data • Technical: Physical and digital features of a data resource • Structural: Configuration of a resource, connections and relationships among parts, or among related resources *Adapted from Jenn Riley, Seeing Standards: A Visualization of the Metadata Universe
Metadata and the Data Life Cycle • Metadata–driven life cycle: Metadata are created, but also used and reused at every stage of the data life cycle • Ideally, metadata continue to accumulate to provide a complete record of the evolution of a dataset
Metadata and the Data Life Cycle Rich metadata = smooth life cycle, high quality data
Structured Metadata • Enhances the value and usability of metadata • A consistent, predictable metadata structure enables • More effective searches • Automated management and processing • Resource sharing • Interoperability • Standardization leads to greater efficiency
Metadata Standards Examples Dublin Core Data Documentation Initiative (DDI) Ecological Markup Language (EML) Astronomy Visualization Darwin Core FGDC Content Standard for Digital Geospatial Metadata (CSDGM) ISO 19115/19139 Geographic information
Standards Cartoon courtesy of XKCD.com
What is DDI? • A metadata standard of and for the community • Two major development lines • DDI Codebook • DDI Lifecycle • Metadata for both human and machine consumption • Additional specifications: • Controlled vocabularies • RDF vocabularies for use with Linked Data
DDI Background and History • Its development started in the mid-1990s, as a grant-funded effort initiated and organized by ICPSR, with international participation • First version published in February 2000
Background and History Continued • The DDI Alliance was formed in 2003 to support and develop the DDI standard http://www.ddialliance.org/ • Ever-growing number of DDI users; large multinational projects • CESSDA data portal (20 European data archives) • International Household Survey Network – IHSN (developing countries from Africa, Asia, former Soviet Union, and more recently, Latin America)
DDI Specification • The first versions of DDI (1.0 through 2.1) were document- and codebook-centric • Version 3.0 was published in April 2008 to document the data life cycle
RDF Vocabularies for Semantic Web • DDI-RDF Discovery Vocabulary • For publishing metadata about datasets into the Web of Linked Data • Based on DDI Codebook and DDI Lifecycle • XKOS • RDF vocabulary for describing statistical classifications, which is an extension of the popular SKOS vocabulary Publication expected in second half of 2013
DDI of the Future • Robust and persistent data model (for the metadata), with extension possibilities, variety of technical expressions • Complete data life cycle coverage • Broadened focus for new research domains • Simpler specification that is easier to understand and use including better documentation
Benefits of DDI Approach • Rich content (currently over 800 items) • Metadata reuse across the life cycle • Machine-actionability • Data management and curation • Support for longitudinal data and comparison
DDI Alignment with Other Metadata Standards • MARC: DDI-C, DDI-L • Dublin Core: DDI-C, DDI-L • SDMX (Statistical Data and Metadata Exchange):DDI-L • ISO 11179 (Metadata Registries): DDI-L • FGDC (Digital Geospatial Metadata): DDI-L • ISO 19115 (Geographic Information Metadata): DDI-L • PREMIS (Preservation Metadata), METS (Metadata Encoding and Transmission): under consideration
DDI-L or DDI-C? • DDI-L • Complex data (hierarchical, longitudinal, comparative) • Metadata-driven survey design (building questionnaires) • Multiple languages • Detailed geographic information • Metadata reuse across the data life cycle • Reusable resources: question/concept/variable banks, registries of organizations and individuals, etc.
DDI-L or DDI-C? • DDI-C • Documentation of simple, survey-type data • Catalog records, involving mainly study-level descriptions (most new features in DDI-L relate to documenting data at item/variable level) • Both DDI-C and DDI-L may be used within the same organization • ICPSR uses DDI-C but has translation to DDI-L for study-level records
DDI-C Structure and Contents DDI-C main sections: • Document Description Self-referencing information about the DDI instance at hand. Usually for internal use, not publicly displayed • Study Description General information about the study. Input is usually the introductory part of a codebook, describing the study scope, methodology, topical/temporal coverage, etc. In DDI-C this section also includes data access and availability information • File Description Describes physical characteristics of data file(s) – name, format, structure, dimensions • Data Description Detailed description of each variable, including variable groups if applicable. Special subsection for documenting census-type aggregate data • Other (Study Related) Materials References, or contains materials used in the production of the study or useful in the analysis of the data For complete content and Tag Library see http://www.ddialliance.org/Specification/DDI-Codebook/2.1/DTD/Documentation/DDI2-1-tree.html
Study-level DDI Elements at ICPSR Date(s) of Collection Mode of Collection Universe Sampling Unit of Analysis Response Rates Weighting Information Data Type Extent of Processing Access Conditions/Restrictions Version History • Study ID (Number, DOI) • Title, Alternate Title • Author/Primary Investigator • Bibliographic Citation • Funding Information • Abstract • Keywords/Topic Classification • Series Information • Geographic Coverage • Time Period Covered • Time Method
Study-level DDI at ICPSR • Leveraged in several ways • Data discovery -- Forms basis of Solr/Lucenefaceted search • Repurposing -- Record is reused across ICPSR’s topical archive sites • Interoperating -- Records shared with Data-PASS, ODESI, and CESSDA archives • Study Overview -- Becomes PDF overview bundled with each download Example: www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30103
Variable-level DDI elements at ICPSR • Variable name and ID • Variable label • Question text • Descriptive variable text • Category labels and values (responses) • Category statistics (frequencies) • Summary statistics • Variable format • Notes
Variable-level DDI at ICPSR • Variable-level DDI leveraged in several ways • Search -- Permits search of variables within a dataset/series • Search across ICPSR -- Serves as foundation for Social Science Variables Database • Integration with online analysis • Codebook with frequencies -- Enables generation of PDF documentation • Example: http://www.icpsr.umich.edu/icpsrweb/ICPSR/ssvd/studies/30103/datasets/1/variables/Q25
Tools for generating DDI metadata • Nesstar Publisher • DDI-C, study, file, and variable level • Colectica • DDI-L configuration, study and variable level • Both DDI-C and DDI-L compatible (import and export) • Exports DDI and PDF, HTML, RTF documentation (no need to re-convert to presentation formats) • Colectica for Excel
Tools continued • XCONVERT (SDA Berkeley) • DDI-C, variable level: converts SAS, SPSS, or Stata syntax into DDI-XML, without frequencies • StatTransfer (v. 11) • DDI-L, variable level: no frequencies • MQDS tool • Exports Blaise to DDI-L to create study documentation
Tools continued • More DDI tools can be found here: http://www.ddialliance.org/resources/tools