330 likes | 735 Views
Capturing Metadata Early In The Research Data Lifecycle. Barry T. Radler, PhD University of Wisconsin-Madison Institute on Aging. April 6, 2017 – NADDI Conference, Cornell University. Overview. Background The Ideal Capture (variable-level) metadata earlier in lifecycle DDI in Theory
E N D
Capturing Metadata Early In The Research Data Lifecycle Barry T. Radler, PhD University of Wisconsin-Madison Institute on Aging April 6, 2017 – NADDI Conference, Cornell University
Overview • Background • The Ideal • Capture (variable-level) metadata earlier in lifecycle • DDI in Theory • Survey Metadata Capture in Practice • UW Survey Center DDI-Word instrument template • Conclusions
Background • MIDUS • Longitudinal multi-disciplinary study of health/well-being • Complex amount of data • Wide secondary usage through ICPSR • DDI facilitates wide use • MIDUS DDI Portal – http://midus.colectica.org
Metadata driven data capture: EDDI 2016 presentations • Archivist & Mapper: Simplifying and Modernising Questionnaire Entry - Will Poynter • Questionnaire Generator- Guillaume Duffes • Rich Metadata from the Start - Oliver Hopt • The DASISH Questionnaire Design and Documentation Tool – Functionalities and Examples from the Tool- Benjamin Beuster, Hilde Orten • Question Banks, Reusability, and DDI 3.2 - Dan Smith • Steps towards a Single Point of Access for Survey Questions across Europe: The Euro Question Bank Project - Wolfgang Zenk-Möltgen, Azadeh Mahmoud Hashemi • Document Questionnaires and Datasets with DDI: A Hands-On Introduction with Colectica- Jeremy Iverson, Dan Smith
Capturing Metadata Earlier in Lifecycle “Every activity in the data life cycle should be documented as it occurs from conceptualization to publication.” – DDI Long-term Infrastructure Manifesto (forthcoming) DDI 3 “Lifecycle”
Leveraging Metadata Earlier in Lifecycle • Capture study and instrument design metadata—once—at time of occurrence or creation • More efficient and easier to capture information about the research workflow at the time of its occurrence rather than after the fact • Metadata capture not realized at time of occurrence or creation leads to information loss • Potentially employ metadata to drive survey administration
The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. • DDI is a free standard that can document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving. • Documenting data with DDI facilitates understanding, interpretation, and use -- by people, software systems, and computer networks.
Advantages of DDI: • Introduces a common communication protocol to research processes • Increases transparency across systems and software • Interoperates with other standards such as DataCite and Dublin Core • A free and open standard (XML) • Advantages of XML: • Is interoperable; not concerned with any particular OS • Widely used data exchange standard • No licenses or usage requirements • Easily transformed into presentation languages such as HTML, PDF or plain text.
Metadata driven research reports • “The Sponsorship on Quality recommended that quality reporting should be streamlined and rationalised across the ESS, by using the existing metadata systems and by creating a “once for all purposes” reporting strategy.“
Challenges to adopting/using DDI • Complexity • DDI 3.2: 1,100 tags • Documentation and training • Low level of researcher buy-in • More appealing to large organizations, official statistics • Need for tools • Lower entry barriers • Utilitarian tools for reuse, not one-off • Organizational resistance to changes in workflow
UWSC experience • Goal: • Documentation standard that produces one source document that can be reused through lifecycle • Create authoring tool that clients are familiar with (Word) • Current CAI: CASES • Computer-Assisted Survey Execution System • DDI2 compliant • Isolated from other lifecycle stages
UWSC experience • Obstacles: • Describe how an instrument: • Behaves(instrument logic and variable metadata) • Looks (layout, display, graphics) • Especially useful for mixed mode surveys • DDI is limited in documenting display issues for production • Can reference external content (URLs)
Metadata and survey mode “One important finding, which was not part of the original remit of this investigation, is awareness of how much harder it is to include in the study documentation a questionnaire that has been developed for collecting data on an electronic device rather than on paper. HDSS, which moved to electronic data collection using specialist software like CSPro, need to be aware that for documentation purposes they need to develop paper versions of the questionnaire for explanatory purposes, or supply the code and its interpretation (e.g., as screen shots) as part of the documentation package.” ChifundoKanjala, Jim Todd, David Beckles, Tito Castillo, Gareth Knight, BaltazarMtenga, Mark Urassa, and BasiaZaba. (2016). Open-access for existing LMIC demographic surveillance data using DDI. IASSIST Quarterly, Summer.
UWSC experience • Obstacles: • Whose metadata is important? • Different types/forms of metadata • Producers • Users
Different actors, different metadata needs • Two stakeholders with competing interests: • The data collector (producer/designer) wants to document the project management processes involved from conceptualization to fielding of final instrument. • The client (user/analyst) wants to document the results produced by the final instrument and any fielding occurrences that can affect the interpretation of those results.
Different actors, different metadata needs From SIMS report: “Only a certain level of detail and only some of the quality concepts are of interest to the general users of European statistics who are mainly interested in the statistical outputs. On the other hand, all detailed quality concepts (up to the lowest level of detail) are of interest to the producers of European statistics who are also interested in the statistical production processes. Some of the concepts are of interest to both groups.”
Capturing metadata early - Conclusions • Capturing metadata early in the research data lifecycle • One DDI document → repurposed for multiple uses • Reduce redundancy and information loss • Technical issues • Across different platforms and systems • Instrument behavior and display across modes of administration • Non-technical issues • Distinct and non-overlapping metadata needs • Within organizations and across different stakeholders • Study-level metadata not as problematic as variable-level? • AAPOR Transparency Initiative
DDI-Word template later in data lifecycle • Study-level Metadata • Objectives, population, sampling, methodology, funding or client identifiers, response rates, disposition codes, quality reports, weighting specs. • Fewer items, changes, display issues • Fewer technical and personnel obstacles • AAPOR Transparency Initiative • Designed to promote methodological disclosure • Develop simple and efficient means for routinely disclosing research methods by identifying common disclosure elements
Special Thanks to UWSC Programmers:Eric WhiteBrendan Day April 6, 2017 – NADDI Conference, Cornell University
Thank you!bradler@wisc.edu This presentation is offered under license CC BY-SA 4.0 April 6, 2017 – NADDI Conference, Cornell University