740 likes | 760 Views
Discover the evolution of DDI 3.0 and its features, benefits, and applications for documenting research projects across the data life cycle stages, ensuring preservation and quality.
E N D
Putting DDI 3.0 to Work for You! Sanda Ionescu, Documentation Specialist, ICPSR Mary Vardigan, DDI Alliance Director IASSIST Conference – Stanford UniversityMay 27, 2008
Today’s Schedule 9:00 – 9:15 Brief DDI History and Intro 9:15 – 9:30 Life Cycle – Early Stages 9:30 – 10:45 Life Cycle Exercise 10:45 – 11:00 Break 11:00 – 11:50 Life Cycle – Archive & Beyond 11:50 – 12:00 Questions and Answers
First Half of Morning • We will be moving through the data life cycle of a real study and will document it as we go. • We will use a tool to produce “markup” for seven life cycle stages. • Sanda will guide us through the exercise and Mary will go step by step onscreen. • End result is DDI documentation deposited into an archive.
Second Half • Once our sample data and documentation are deposited, we review the changes made by the archive. • Then we discuss DDI 3.0 in the archival context and why it makes sense to use it. • Finally, assuming we have convinced you, we discuss how to move to DDI 3.0!
DDI History • Effort began in 1995 when ICPSR convened a small international group at IASSIST in Quebec City. • Standard began as SGML, then converted to Web-friendly XML. • 2000 – DDI Version 1.0 published as a DTD, mainly document- and codebook-centric.
DDI History • 2003 – DDI Version 2.0 published with extended scope including aggregate data coverage and geography. • Versions 1.0 through 2.1 (latest published) are backwards compatible, and based on the same structure.
DDI History • February 2003 – Formation of the DDIAlliance, a self-sustaining membership organization whose members have a voice in the development of the DDI specification. http://www.ddialliance.org/
DDI History Version 3.0: • 2004-2006: Planning and Development • November 2006: Internal Review • February 2007: Public Review • July 2007: Candidate Draft Release • April 2008: Proof of Concept and Vote • April 28, 2008: Official Publication of DDI 3.0 http://www.ddialliance.org/ddi3/index.html
DDI 3.0 Features • Full implementation of XML Schemas • Emphasis on metadata reuse: • Modular structure • Use of schemes
DDI 3.0 FeaturesModular structure • Allows increased flexibility in using the specification. • Main modules: Instance Study Unit Resource Package Group Conceptual Components Data Collection Logical Product Physical Data Product Physical Instance Archive Comparative
Concepts Universes Geographic Locations Geographic Structures Questions Interviewer Instructions Variables Categories Codes NCubes Physical Structures Record Layouts Organizations DDI 3.0 FeaturesUse of Schemes • Facilitates reuse of information:
DDI 3.0 Features • Machine-actionable • Grouping and comparison features • Registries now possible • Versioning clarified • Multi-lingual support
DDI 3.0 Features • Compatibility with other metadata standards: • MARC, DC, but also… • SDMX (Statistical Data and Metadata Exchange) • ISO 11179 (Metadata Registries) • FGDC (Digital Geospatial Metadata) • ISO 19115 (Geographic Information Metadata) • PREMIS, METS – forthcoming… • Life cycle orientation
Life Cycle Orientation DDI 3.0 documents all stages in the life cycle of a data collection: pre-productionproductionpost-production secondary use new research effort
DDI 3.0 Use Cases • Documenting an on-going, original research project. • Documenting secondary use of data. • Creating concept/question/variable banks. • Generating multiple delivery formats for data dissemination/discovery. • Metadata mining for comparison, etc.
DDI 3.0 to Document an On-going Research Project • DDI 3.0 can be used to document a research project in “real time”, from its inception (study proposal, design) through data collection, processing, and initial data production.
Research Staff Principal Investigator Collaborators <DDI 3.0> Questions Instrument <DDI 3.0> Variables Physical Stores + <DDI 3.0> Purpose Concepts Universe Geography People/Orgs <DDI 3.0> Funding Revisions + + + <DDI 3.0> Data Collection Data Processing $ € £ Data Archive/ Repository Submitted Proposal Publication
DDI 3.0 to Document an On-going Research Project Advantages: • Richer, contextual information made available and preserved. • Increased accuracy, as life cycle stages are documented “at the source”. • No loss of information as study progresses through its life cycle. • Changes in documentation preserved through versioning. • Ultimately gives data analysts more information to understand and assess data quality.
DDI 3.0 to Document an On-going Research Project Use case exercise: • Academic environment. • Faculty member/researcher initiates an original, independent research project. • Small-scale effort. • No use of computer-assisted interviewing software. • Resulting data and documentation to be deposited to a data center/archive. • Archive provides incentives and support for documenting all activities in DDI as they happen.
DDI 3.0 to Document an On-going Research Project Incentivesfor entering documentation “at the source”: • Information easy to enter: use of data entry tool “hides” complexities of xml code. • Underlying DDI structure provides prompts and pre-organizes information. • DDI may also serve as a management/diagnostic tool to assist in data processing and cleaning operations, or revising the documentation. • Real-time entries and standardized content ensure high-quality documentation that facilitates primary data analysis and preparing reports.
DDI 3.0 to Document an On-going Research Project Use case exercise: • Based on a real study in the ICPSR archive (ICPSR study No. 9413, “Survey of Three Generations of Mexican Americans, 1981-1982”) • Study documentation is laid out sequentially according to the life cycle. http://www.icpsr.umich.edu/DDI/ddi3/workshop • Data entry tool provides a user-friendly interface and is projected to produce DDI 3.0 output; follows life cycle, but may also be used retrospectively.
Life Cycle StagesStudy Proposal WHO? (Principal Investigator) When? (November 1st, 1979) WHO? (Co-authors) Research Question(s) Hypotheses Population Geographic Area Provisional Title
Life Cycle StagesStudy Proposal: Input http://www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_TransformerTOOL/DDIv2dot2.html#
Life Cycle StagesStudy Proposal: DDI 3.0 Output DDI: WHO? (Principal Investigator) Archive: Individual Life Cycle Event: Responsibility Date When? WHO? (Co-authors) Study Unit: Creator (s) Title Purpose Universe Ref. Spatial Coverage (Provisional Title) Research Question(s) Hypotheses Population Geographic Area Conceptual Component: Universe Geographic Structure http://www.icpsr.umich.edu/DDI/ddi3/workshop/files/iassist_stdyprop.pdf
Life Cycle StagesStudy Funding WHO? Funding Agency WHEN? (June 1st, 1980) Proposal Grant 5-R01-AG-01573
Life Cycle StagesStudy Funding: Input http://www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_TransformerTOOL/DDIv2dot2.html#
Life Cycle StagesStudy Funding: DDI 3.0 Output DDI: Archive: Organization WHO? Funding Agency Study Unit: Funding Agency Grant Number Life Cycle Event:Responsibility Date Proposal http://www.icpsr.umich.edu/DDI/ddi3/workshop/files/iassist_stdyfunding.pdf
Life Cycle StagesDefining Concepts WHO? WHEN? (July 1st, 1980) Question/Concept Bank Research Questions (+) Study Concepts =
Life Cycle StagesDefining Concepts: Input http://www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_TransformerTOOL/DDIv2dot2.html#
Life Cycle StagesDefining Concepts: DDI 3.0 Output DDI: Life Cycle Event: Responsibility, Date… DDI Concept Scheme (Ref.) Question/Concept Bank Research Questions (+) Study Concepts = http://www.icpsr.umich.edu/DDI/ddi3/workshop/files/iassist_concepts.pdf
Life Cycle StagesQuestionnaire Design WHO? WHEN? (July 25, 1980) Question/Concept Bank Study Concepts (+) Questions, Responses =
Life Cycle StagesQuestionnaire Design: Input http://www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_TransformerTOOL/DDIv2dot2.html#
Life Cycle StagesQuestionnaire Design: DDI 3.0 Output DDI: Life Cycle Event: Responsibility, Date… DDI Question Scheme (Ref.) Question/Concept Bank Study Concepts (+) Logical Product: Category Scheme(s) Code Schemes Questions, Responses = http://www.icpsr.umich.edu/DDI/ddi3/workshop/files/iasssist_questions.pdf
Life Cycle StagesQuestionnaire Translation WHO? WHEN? (September 1st, 1980) Original Language Questions, Responses Translated Questions, Responses
Life Cycle StagesQuestionnaire Translation: Input http://www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_TransformerTOOL/DDIv2dot2.html#
Life Cycle StagesQuestionnaire Translation: DDI 3.0 Output DDI: Life Cycle Event: Responsibility, Date… DDI Question Scheme -Bilingual Version- Original Language Questions, Responses Logical Product: Category Scheme(s) -Bilingual Version- Translated Questions, Responses http://www.icpsr.umich.edu/DDI/ddi3/workshop/files/iassist_transl_qstns.pdf
Life Cycle StagesData Collection WHO? WHO? (1981-1982) REPORT SAMPLE (October 15, 1980 – April 1st, 1981)
Life Cycle StagesData Collection: Input http://www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_TransformerTOOL/DDIv2dot2.html#
Life Cycle StagesData Collection: DDI 3.0 Output DDI: Life Cycle Events: Responsibility, Dates… Data Collection: Responsibility Date Sampling Mode Of Collection Note http://www.icpsr.umich.edu/DDI/ddi3/workshop/files/iassist_datacoll.pdf
Life Cycle StagesData Production WHO? WHEN? (1983) Q&A DATA
Life Cycle StagesData Production: Input http://www.icpsr.umich.edu/DDI/ddi3/workshop/DDI3_TransformerTOOL/DDIv2dot2.html#
Life Cycle StagesData Production: DDI 3.0 Output DDI: Life Cycle Event: Responsibility, Date… Data Collection: (Processing Operations) Logical Product: Variable Scheme Additional Code/Category Schemes [Missing Data] Physical Data Product: Record Structure* Variables’ Locations Q&A Physical Instance: (Processing Checks) Number of Cases Number of Records DATA http://www.icpsr.umich.edu/DDI/ddi3/workshop/files/iassist_dataprod.pdf
Life Cycle StagesData Cleaning and Processing:DDI as diagnostic/management tool • The presence of standardized documentation facilitates data processing. • DDI documentation can be used as a project “dashboard” to identify problems and keep track of operations. • Queries can address: • Data errors: missing values, out-of-range values (incorrect computation or recode logic), inconsistent or undocumented codes • Missing documentation: question text, description • Editing errors: missing labels, misspelled variable names
Life Cycle StagesDeposit to Archive At the time of deposit, both the research process and the data are already documented in DDI… Advantages: • The presence of standardized information facilitates archival processing, enabling procedure streamlining and automation. • Richer, more accurate information made available for preservation, archival processing and dissemination: enhances data discovery and secondary analysis.
Life Cycle StagesDeposit to Archive Richer, more accurate information. Examples: • Original / working title preserved (may be found in early reports, published prior to any title changes). • Author’s affiliation and position at the time of research. • Responsible agencies and dates made available for all life cycle events. • Parallel / associated research efforts and publications accurately documented.
Life Cycle StagesDeposit to Archive Richer, more accurate information. Examples: • Presence of concepts represents an important added value for data discovery, appraisal, and further analysis. • Documented source of concepts and questions (original or re-used) is relevant for secondary, and particularly comparative analysis efforts. • For bi- or multilingual studies, multiple language versions of descriptive elements are made available side-by-side, facilitating comparison, analysis and/or filtered specific language(s) retrieval. http://www.icpsr.umich.edu/cocoon/DDI3/workshop/9413_CR3_2_DataProd.xml?display=vars&highlight-token=no
Life Cycle StagesDeposit to Archive • Use of DDI throughout the study life cycle prevents loss of information. • Preservation of successive versions allows early-bound information retrieval. • To meet specific goals and needs, the archive may create its own version(s) of the documentation, but will also preserve the originally deposited version. • The DDI format enables easy, automated navigation among all existing versions.
Life Cycle StagesArchival Processing: Data and Documentation The archive becomes the maintaining agency and creates its own instance: • The archive is described as organization, as owner/maintainer of collection, and specified as (new) publisher and/or distributor, with appropriate date(s). • Original archive (depositor to present archive) referenced in the archive module. • Reference may also be included to originally deposited DDI that is preserved and also made accessible.
Life Cycle StagesArchival Processing: Data and Documentation The archive edits or adds information and populates new DDI fields to support archival operations: • Edits title to conform to archive’s standards (ICPSR adds study date) • Updates author’s affiliation according to current position, and adds/updates contact information (telephone, e-mail, current address, etc.) • Adds subject headings and keywords to assist data discovery (searches at study level)