1 / 13

Mary Vardigan, Pascal Heus, Wendy Thomas

3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative: Toward a Standard for the Social Sciences. Mary Vardigan, Pascal Heus, Wendy Thomas

Download Presentation

Mary Vardigan, Pascal Heus, Wendy Thomas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3rd International Digital Curation ConferenceWashington, DC, Dec 2007Paper Presentations: Interoperability, Metadata & StandardsData Documentation Initiative: Toward a Standard for the Social Sciences Mary Vardigan, Pascal Heus, Wendy Thomas ICPSR/University of Michigan / Open Data Foundation / Minnesota Population Center vardigan@umich.edu / pheus@opendatafoundation.org / wlt@pop.umn.edu

  2. What is Metadata? Labeled stuff Unlabeled stuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf • Common definition: Data about Data DDI Alliance – http://www.ddialliance.org

  3. Managing data and metadata is challenging! Academic Producers Users Librarians Sponsors Business Media/Press Policy Makers General Public Government • We are in charge of the data. We support our users but also need to protect our respondents! We have an information management problem • We want easy access to high quality and well documented data! • We need to collect the information from the producers, preserve it, and provide access to our users! DDI Alliance – http://www.ddialliance.org

  4. Metadata issues • Without producer / archive metadata • researchers can’t work discover data or perform efficient analysis • Without researcher metadata • Research process is not documented and cannot be reproduced (Gary King  replication standard!) • Other researchers are not aware of what has been done (duplication / lack of visibility) • Producer don’t know about data usage and quality issues • Without standards • Such information can’t be properly managed and exchanged between actors or with the public • Without tools: • We can’t capture, preserve or share knowledge DDI Alliance – http://www.ddialliance.org

  5. XML to the rescue! • XML stands for eXtensible Markup Language • Technology that is driving today’s web service oriented architecture of the Internet and Intranets • Using XML, we can capture, structure, transform, discover, exchange, query, edit and secure metadata and data • XML is platform & language independent and can be used by everyone • XML is both machine and human readable • XML is non-proprietary, public domain and many open tools exist • Domain specific standards are available! DDI Alliance – http://www.ddialliance.org

  6. Suggested XML metadata specifications for socio-economic data • Statistical Data and Metadata Exchange (SDMX) • Macrodata, time series, indicators, registries • http://www.sdmx.org • Data Documentation Initiative (DDI) • Microdata (surveys, studies) • http://www.ddialliance.org • ISO 11179 • Semantic modeling, concepts, registries • http://metadata-standards.org/11179/ • ISO 19115 • Geography • http://www.isotc211.org/ • Dublin Core • Resources (documentation, images, multimedia) • http://www.dublincore.org DDI Alliance – http://www.ddialliance.org

  7. The Data Documentation Initiative (DDI) • International XML based specification for the documentation of social and behavioral data • Started in 1995, now driven by DDI Alliance (30+ members) • Became XML specification in 2000 (v1.0) • Current version is 2.1 with focus on archiving (survey/codebook) • New Version 3.0 (2008) • Focus on entire survey “Life Cycle” • Provide comprehensive metadata on the entire survey process and usage • Aligned on other metadata standards (DC, MARC, ISO 11179, SDMX, …) • Include machine actionable elements to facilitate processing, discovery and analysis • DDI is being adopted by producers/archives but needs to extends to the researchers (who are using the data!) DDI Alliance – http://www.ddialliance.org

  8. DDI 3.0 and the Survey Life Cycle • A survey is not a static process: It dynamically evolved across time and involves many agencies/individuals • DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle” • 3.0 focus on metadata reuse (minimizes redundancies/discrepancies, support comparison) • Also supports multilingual, grouping, geography, and others • 3.0 is extensible DDI Alliance – http://www.ddialliance.org

  9. Metadata Components • Producer metadata: • Codebook, questionnaires, reports, methodologies, processing, scripts, quality, admin, etc. • Research metadata • Recodes, analysis, table, scripts, papers, logs, data quality, usage • Citations, references • Activities, discussions, knowledge base • Outputs • Papers, presentations, tables, reports DDI Alliance – http://www.ddialliance.org

  10. When to capture metadata? • Metadata must be captured at the time the event occurs! (not after the facts) • Documenting after the facts leads to considerable loss of information • This is true for producers and researchers DDI Alliance – http://www.ddialliance.org

  11. Solutions? • Simple solutions: use good practices • File and variable naming conventions, sound statistical methods (metadata in names!) • Comment source code • Document your work • Adopt DDI & other standard based metadata solutions: • DDI tools, citation database, source code level metadata capture, variable recodes, table disclosure, data quality feedback, comparability • Take advantage of web based collaborative tools • Wiki, blogs, discussion groups, lists DDI Alliance – http://www.ddialliance.org

  12. Benefits • Comprehensive data documentation • Through good metadata practices, comprehensive documentation captured by producers, librarians and users is available to ALL researchers • Preservation, integration and sharing of knowledge • Research process is captured and preserved in standard formats • Research knowledge becomes integrant part of the survey and available to all • Reduce duplication of efforts and facilitates reuse • Producer gets feedback from the data users (usage, quality issues), which lead to better and more relevant data • Research outputs and dissemination • Facilitate production of research outputs • Facilitate dissemination and fosters broader visibility of research results DDI Alliance – http://www.ddialliance.org

  13. Conclusions • Metadata is a crucial component of social and behavioral science • The Data Documentation Initiative (DDI) is a globally accepted specification for capturing microdata documentation and knowledge • Latest version 3.0 extends into the entire survey Life Cycle • Producers and data archives are rapidly adopting metadata standards. • This adoption process should extend into the research community • Best practices in data and metadata management benefit all users and have the potential to change the way we conduct research • http://www.ddialliance.org or ddi@ddialliance.org DDI Alliance – http://www.ddialliance.org

More Related