130 likes | 270 Views
3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative: Toward a Standard for the Social Sciences. Mary Vardigan, Pascal Heus, Wendy Thomas
E N D
3rd International Digital Curation ConferenceWashington, DC, Dec 2007Paper Presentations: Interoperability, Metadata & StandardsData Documentation Initiative: Toward a Standard for the Social Sciences Mary Vardigan, Pascal Heus, Wendy Thomas ICPSR/University of Michigan / Open Data Foundation / Minnesota Population Center vardigan@umich.edu / pheus@opendatafoundation.org / wlt@pop.umn.edu
What is Metadata? Labeled stuff Unlabeled stuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf • Common definition: Data about Data DDI Alliance – http://www.ddialliance.org
Managing data and metadata is challenging! Academic Producers Users Librarians Sponsors Business Media/Press Policy Makers General Public Government • We are in charge of the data. We support our users but also need to protect our respondents! We have an information management problem • We want easy access to high quality and well documented data! • We need to collect the information from the producers, preserve it, and provide access to our users! DDI Alliance – http://www.ddialliance.org
Metadata issues • Without producer / archive metadata • researchers can’t work discover data or perform efficient analysis • Without researcher metadata • Research process is not documented and cannot be reproduced (Gary King replication standard!) • Other researchers are not aware of what has been done (duplication / lack of visibility) • Producer don’t know about data usage and quality issues • Without standards • Such information can’t be properly managed and exchanged between actors or with the public • Without tools: • We can’t capture, preserve or share knowledge DDI Alliance – http://www.ddialliance.org
XML to the rescue! • XML stands for eXtensible Markup Language • Technology that is driving today’s web service oriented architecture of the Internet and Intranets • Using XML, we can capture, structure, transform, discover, exchange, query, edit and secure metadata and data • XML is platform & language independent and can be used by everyone • XML is both machine and human readable • XML is non-proprietary, public domain and many open tools exist • Domain specific standards are available! DDI Alliance – http://www.ddialliance.org
Suggested XML metadata specifications for socio-economic data • Statistical Data and Metadata Exchange (SDMX) • Macrodata, time series, indicators, registries • http://www.sdmx.org • Data Documentation Initiative (DDI) • Microdata (surveys, studies) • http://www.ddialliance.org • ISO 11179 • Semantic modeling, concepts, registries • http://metadata-standards.org/11179/ • ISO 19115 • Geography • http://www.isotc211.org/ • Dublin Core • Resources (documentation, images, multimedia) • http://www.dublincore.org DDI Alliance – http://www.ddialliance.org
The Data Documentation Initiative (DDI) • International XML based specification for the documentation of social and behavioral data • Started in 1995, now driven by DDI Alliance (30+ members) • Became XML specification in 2000 (v1.0) • Current version is 2.1 with focus on archiving (survey/codebook) • New Version 3.0 (2008) • Focus on entire survey “Life Cycle” • Provide comprehensive metadata on the entire survey process and usage • Aligned on other metadata standards (DC, MARC, ISO 11179, SDMX, …) • Include machine actionable elements to facilitate processing, discovery and analysis • DDI is being adopted by producers/archives but needs to extends to the researchers (who are using the data!) DDI Alliance – http://www.ddialliance.org
DDI 3.0 and the Survey Life Cycle • A survey is not a static process: It dynamically evolved across time and involves many agencies/individuals • DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle” • 3.0 focus on metadata reuse (minimizes redundancies/discrepancies, support comparison) • Also supports multilingual, grouping, geography, and others • 3.0 is extensible DDI Alliance – http://www.ddialliance.org
Metadata Components • Producer metadata: • Codebook, questionnaires, reports, methodologies, processing, scripts, quality, admin, etc. • Research metadata • Recodes, analysis, table, scripts, papers, logs, data quality, usage • Citations, references • Activities, discussions, knowledge base • Outputs • Papers, presentations, tables, reports DDI Alliance – http://www.ddialliance.org
When to capture metadata? • Metadata must be captured at the time the event occurs! (not after the facts) • Documenting after the facts leads to considerable loss of information • This is true for producers and researchers DDI Alliance – http://www.ddialliance.org
Solutions? • Simple solutions: use good practices • File and variable naming conventions, sound statistical methods (metadata in names!) • Comment source code • Document your work • Adopt DDI & other standard based metadata solutions: • DDI tools, citation database, source code level metadata capture, variable recodes, table disclosure, data quality feedback, comparability • Take advantage of web based collaborative tools • Wiki, blogs, discussion groups, lists DDI Alliance – http://www.ddialliance.org
Benefits • Comprehensive data documentation • Through good metadata practices, comprehensive documentation captured by producers, librarians and users is available to ALL researchers • Preservation, integration and sharing of knowledge • Research process is captured and preserved in standard formats • Research knowledge becomes integrant part of the survey and available to all • Reduce duplication of efforts and facilitates reuse • Producer gets feedback from the data users (usage, quality issues), which lead to better and more relevant data • Research outputs and dissemination • Facilitate production of research outputs • Facilitate dissemination and fosters broader visibility of research results DDI Alliance – http://www.ddialliance.org
Conclusions • Metadata is a crucial component of social and behavioral science • The Data Documentation Initiative (DDI) is a globally accepted specification for capturing microdata documentation and knowledge • Latest version 3.0 extends into the entire survey Life Cycle • Producers and data archives are rapidly adopting metadata standards. • This adoption process should extend into the research community • Best practices in data and metadata management benefit all users and have the potential to change the way we conduct research • http://www.ddialliance.org or ddi@ddialliance.org DDI Alliance – http://www.ddialliance.org