1.55k likes | 1.57k Views
Explore the benefits of DDI 3.0 as the XML-based standard for social science data documentation. Join us for an insightful workshop session covering the development history, technical mechanisms, and new features. Learn how DDI 3.0 enhances data discovery, supports data analysis, and covers the complete data life cycle.* *
E N D
Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007
DDI Version 3.0 • Radically different. • More complex… (…but certainly doable!) • Brings important benefits.
Workshop Schedule 14:30 – 15:10 Overview (40) 15:10 – 15:35 Structure and Technical Mechanisms (25) 15:35 – 15:45 Break (10) 15:45 – 16:10 Study Unit – Modules Content (25) 16:10 – 16:30 Variable Markup Example (20) 16:30 – 16:40 Break (10) 16:40 – 17:10 Grouping – Modules Content and Examples (30) 17:10 – 17:30 Getting Started (20)
DDI 3.0 Overview
DDI BackgroundDevelopment History • 1995 – A grant-funded project initiated and organized by ICPSR proposes to create a new standard for documenting social science data, to replace OSIRIS tagged codebooks. • First drafts used SGML, then converted to Web-friendly XML. • 2000 – DDI Version 1.0 published as a mainly document- and codebook-centric standard.
DDI BackgroundDevelopment History • 2003 – DDI Version 2.0 published with extended scope: • Aggregate data coverage (based on matrix structure) • Additional geographic representation to assist geographic search systems and GIS users • Versions 1.0 through 2.1 (latest published) are backwards compatible, and based on the same structure.
DDI BackgroundDevelopment History • February 2003 – Formation of the DDIAlliance, a self-sustaining membership organization whose members have a voice in the development of the DDI specification. http://www.ddialliance.org/
DDI BackgroundDevelopment History Version 3.0: • 2004-2006: Planning and Development • November 2006: Internal Review • February 2007: Public Review • July 2007: Candidate Draft Release http://www.ddialliance.org/ddi3/index.html
Benefits of using DDI as an XML-based standard • Interoperability: • Enables seamless exchange and reuse by other systems. • Repurposing: • Provides a core document from which different types of outputs can be generated. • Value-added documentation: • Tagging carries “intelligence” in the document by describing content. • Enhanced Data Discovery: • Increases precision and granularity of searches. • Support for Data Analysis: • Variables description is accepted as input by online analysis systems. • Multiple presentation formats: • ASCII – text; PDF; HTML; RTF. • Preservation-friendly: • Non-proprietary format.
Why DDI 3.0? DDI 3.0 presents new features in response to: • Perceived needs of: -Data users -Data producers -Data archivists/librarians • Developments in documenting and archiving data • Advances in XML technology
DDI 3.0 and the Data Life Cycle Model DDI Versions 1/2 were codebook-centric: • Closely followed the structure of traditional print codebooks. • Captured data documentation at a single, “frozen” point in time – archiving.
DDI 3.0 and the Data Life Cycle Model Version 3.0 is Life Cycle oriented: -Designed to cover all stages in the life cycle of a data collection: pre-productionproductionpost-production secondary use
Life Cycle Coverage in DDI 3.0 • Planning for the Study: Proposal / Design Study Purpose / Outline Concepts Study Population Author(s) Funding Sources Version 3.1 Survey / Sample Design Pre-testing
Life Cycle Coverage in DDI 3.0 Proposal becomes reality… Data Collection methodology: sampling, time, etc. Instrument characteristics Questionnaire Data cleaning, weighting, coding, etc.
Life Cycle Coverage in DDI 3.0 Publishing the data… Physical representation: Data format, Record structure, Statistics. Intellectual content: Variables, Categories, Codes.
Life Cycle Coverage in DDI 3.0 Archiving / (Re)Distributing the data collection… Processing checks Holdings, availability and access conditions
Life Cycle Coverage in DDI 3.0 DDI becomes “visible” to the outside world… DDI Instance: Pulls together all life cycle stages Acquires its own identity as an object Becomes a tool for data discovery and analysis
Life Cycle Coverage in DDI 3.0 Secondary use of data – new conceptual framework… New DDI Instance: New Purpose New Logical Product New Physical Description of Data
DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: • Allows capture and preservation of metadata generated by different agents at different points in time. • Facilitates tracking changes and updates in both data and documentation.
DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: • Enables investigators, data collectors and producers to document their work directly in DDI, thus increasing the metadata’s visibility and usability. • Benefits data users, who need information from the full data life cycle for optimal discovery, evaluation, interpretation, and re-use of data resources.
New / Extended Functionalities in DDI 3.0: Questionnaire Versions 1/2: • No instrument coverage. • Question text only as part of variable description. • No documentation for question flow / conditions. Version 3.0: • Full description of instrument as a separate entity. • Documents specific use of questions: flow, conditions, loops. • Compatible with Computer Assisted Interviewing software.
New / Extended Functionalities in DDI 3.0: Complex Data Versions 1/2: • Inadequate representation of complex / hierarchical data Version 3.0: • Detailed documentation for complex / hierarchical data Logical structure of records Record Types and Relationships Relevant variables: key-link, case identification, record type locator Physical layout of records Single “hierarchical” file for all records, multiple rectangular files, relational database, etc.
New / Extended Functionalities in DDI 3.0: Aggregate Data Versions 1/2: • Initially designed for microdata only • Aggregate data section added in V 2.1 to support limited representation (Census-type data, delimited files) Version 3.0: • Adds support for tabular, spreadsheet-type, representation of aggregate data • Aggregate data transport option: cell content may be included inline with the data item description
New / Extended Functionalities in DDI 3.0: Data Transport Versions 1/2: -None Version 3.0: -In-line inclusion enabled for both aggregate data and microdata
New / Extended Functionalities in DDI 3.0: Longitudinal / Time Series / Cross-national DataComparability Versions 1/2: -None Version 3.0: -Grouping structure documents studies related on one or several dimensions (time, geography, language, etc.) as well as their comparability
New / Extended Functionalities in DDI 3.0:Increased Multilingual Support Versions 1/2: • Limited <anytag xml:lang=“”> Version 3.0: • Support for multiple language use and translations <InternationalStringType xml:lang=“” translated=“” translatable=“”> <Variable> <Label xml:lang=“ger” translated=“false” translatable=“true”> Geburtsjahr</Label> <Label xml:lang=“eng” translated=“true”>Year of Birth</Label> </Variable>
DDI 3.0 Specification: Schema-based Versions 1/2: • DTD-based Version 3.0: • Schema-based: Data typing supports machine actionability Use of namespaces supports • Modularity • Extensibility and reuse • Alignment with / use of other standards
DDI 3.0 Specification: Machine-actionable Versions 1/2: • Machine-readable Version 3.0: • Machine-actionable: 1. Data typing: increased use of controlled vocabularies and standard codes 2. Larger set of required elements Predictable content = a more consistent base for programming
DDI 3.0: Modular Structure Version 1/2: • Single file, hierarchical design Version 3.0: • Modular design: • Facilitates reuse • Facilitates versioning and maintenance • Supports life cycle model • Allows flexibility in organizing the DDI Instance • Supports grouping and comparing studies • Supports creation of metadata registries
DDI 3.0: Alignment with other metadata standards Versions 1/2: • MARC, Dublin Core (bibliographic standards) Version 3.0: • MARC, DC, but also… • SDMX (Statistical Data and Metadata Exchange) • ISO 11179 (Metadata Registries) • FGDC (Digital Geospatial Metadata) - ISO 19115 (Geographic Information Metadata)
DDI 1/2 or DDI 3.0? • DDI 3.0 will not supersede DDI 2.1. • Both versions will • coexist • continue to be maintained • be used according to specific needs. • All DDI 1/2 markup will not have to be migrated to Version 3.0.
DDI 3.0 Structure and Mechanisms
DDI 3.0 – Modular Structure Building blocks of DDI 3.0: • Modules • Schemes
DDI 3.0 – Modular Structure Modules: • Document different aspects of a study, or group of studies, following the data through their life cycle (Conceptual Components, Data Collection, Logical Product, Physical Instance, etc.) Schemes: • Include collections of sibling “objects” that are traditionally components of a variable description: Concepts, Universes, Questions, Variable Labels and Names, Categories, Codes.
DDI 3.0 – Modular Structure Modules: • Can live independently (have their own schemas) or connected to one another within a hierarchical structure. Schemes: • Can live semi-independently (need a higher-level wrapper as they do not have their own schemas) or in-line within a Study Unit or Group module.
DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Module level: DDI Instance Study Unit Group Resource Package Conceptual Components Data Collection Archive Study Unit Subgroup Study Unit (Sub)group Organizations Study Unit Subgroup
DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Within modules: Data Collection Methodology Question Scheme Processing Sampling Time Method Question Item Question Item Weighting Coding
DDI 3.0 – Modular Structure Relationships are established through: • In-line inclusion (Relational order is explicit) • Referencing Internal External (Relational order is implicit)
DDI 3.0 – Structural mechanisms Enable modular design and help actualize its benefits. • Inheritance • Referencing • Identification
DDI 3.0: Inheritance • Inheritance is based on the hierarchical structure of the model. • In DDI 3.0 a number of elements are reused at different levels of the hierarchy. • When the same element is present at multiple levels, lower levels inherit content from the upper levels, and only need to specify differences (=local overrides).
DDI 3.0 InheritanceExample • Instance: Coverage: Spatial: 50 US states -Study Unit A – no Spatial Coverage defined = will be inherited from Instance -Study Unit B – Coverage: Spatial: 48 coterminous states = supersedes definition in Instance
DDI 3.0: Referencing • DDI 3.0 modular structure is dependent upon creating relationships by reference. • Referencing implies bringing up the content of a DDI object within, or in association with, another object, by specifying its Unique Identifier. • Identifiers are the key links between DDI objects.
Data Collection Module: Question Scheme: Question: ID: “Q1” Text: “How many days in the past week did you watch the national network news on TV?” Conceptual Components Module: Concept Scheme: Concept: ID: “C1” Description: “Exposure to national TV news” DDI 3.0: ReferencingExample Logical Product Module: Variable Scheme: Variable: ID: “V1” Name: V043014 Label: Days past week watch natl news on TV Question Reference: ID: “Q1” Concept Reference: ID : “C1”
DDI 3.0: Identification Consistency in building and using identifiers is needed for: • Proper functioning of reference systems, enabling a smooth exchange and reuse of existing metadata. • Machine-actionability of DDI instances, allowing them to serve as a basis for running programs and processes.
DDI 3.0: Identification Element types used in the Identification system:
DDI 3.0: IdentificationElement Types Non-identified elements: • Require context, which is provided by containing parents. Example: codes within code schemes • Are not reusable. Example: variable and category statistics
DDI 3.0: IdentificationElement Types Identifiables • Carry their own ID • May be referenced / reused • Cannot be versioned or maintained, except as part of a complex parent element (Example: Variable – a change implies a new version of the entire scheme).
DDI 3.0: IdentificationElement Types Versionables • Carry their own ID • Carry their own Version: content changes are important to note (Example: Concept – may be independently versioned within a scheme).
DDI 3.0: IdentificationElement Types Maintainables • Are higher level DDI objects • Are both identifiable and versionable • Can also be published and maintained as separate entities (Example: all modules, schemes, comparison maps)