1 / 155

**"DDI 3.0 Workshop: Maximizing Data Benefits with XML Standard"** *

Explore the benefits of DDI 3.0 as the XML-based standard for social science data documentation. Join us for an insightful workshop session covering the development history, technical mechanisms, and new features. Learn how DDI 3.0 enhances data discovery, supports data analysis, and covers the complete data life cycle.* *

bettyej
Download Presentation

**"DDI 3.0 Workshop: Maximizing Data Benefits with XML Standard"** *

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007

  2. DDI Version 3.0 • Radically different. • More complex… (…but certainly doable!) • Brings important benefits.

  3. Workshop Schedule 14:30 – 15:10 Overview (40) 15:10 – 15:35 Structure and Technical Mechanisms (25) 15:35 – 15:45 Break (10) 15:45 – 16:10 Study Unit – Modules Content (25) 16:10 – 16:30 Variable Markup Example (20) 16:30 – 16:40 Break (10) 16:40 – 17:10 Grouping – Modules Content and Examples (30) 17:10 – 17:30 Getting Started (20)

  4. DDI 3.0 Overview

  5. DDI BackgroundDevelopment History • 1995 – A grant-funded project initiated and organized by ICPSR proposes to create a new standard for documenting social science data, to replace OSIRIS tagged codebooks. • First drafts used SGML, then converted to Web-friendly XML. • 2000 – DDI Version 1.0 published as a mainly document- and codebook-centric standard.

  6. DDI BackgroundDevelopment History • 2003 – DDI Version 2.0 published with extended scope: • Aggregate data coverage (based on matrix structure) • Additional geographic representation to assist geographic search systems and GIS users • Versions 1.0 through 2.1 (latest published) are backwards compatible, and based on the same structure.

  7. DDI BackgroundDevelopment History • February 2003 – Formation of the DDIAlliance, a self-sustaining membership organization whose members have a voice in the development of the DDI specification. http://www.ddialliance.org/

  8. DDI BackgroundDevelopment History Version 3.0: • 2004-2006: Planning and Development • November 2006: Internal Review • February 2007: Public Review • July 2007: Candidate Draft Release http://www.ddialliance.org/ddi3/index.html

  9. Benefits of using DDI as an XML-based standard • Interoperability: • Enables seamless exchange and reuse by other systems. • Repurposing: • Provides a core document from which different types of outputs can be generated. • Value-added documentation: • Tagging carries “intelligence” in the document by describing content. • Enhanced Data Discovery: • Increases precision and granularity of searches. • Support for Data Analysis: • Variables description is accepted as input by online analysis systems. • Multiple presentation formats: • ASCII – text; PDF; HTML; RTF. • Preservation-friendly: • Non-proprietary format.

  10. Why DDI 3.0? DDI 3.0 presents new features in response to: • Perceived needs of: -Data users -Data producers -Data archivists/librarians • Developments in documenting and archiving data • Advances in XML technology

  11. DDI 3.0 and the Data Life Cycle Model DDI Versions 1/2 were codebook-centric: • Closely followed the structure of traditional print codebooks. • Captured data documentation at a single, “frozen” point in time – archiving.

  12. DDI 3.0 and the Data Life Cycle Model Version 3.0 is Life Cycle oriented: -Designed to cover all stages in the life cycle of a data collection: pre-productionproductionpost-production secondary use

  13. Life Cycle Coverage in DDI 3.0 • Planning for the Study: Proposal / Design Study Purpose / Outline Concepts Study Population Author(s) Funding Sources Version 3.1 Survey / Sample Design Pre-testing

  14. Life Cycle Coverage in DDI 3.0 Proposal becomes reality… Data Collection methodology: sampling, time, etc. Instrument characteristics Questionnaire Data cleaning, weighting, coding, etc.

  15. Life Cycle Coverage in DDI 3.0 Publishing the data… Physical representation: Data format, Record structure, Statistics. Intellectual content: Variables, Categories, Codes.

  16. Life Cycle Coverage in DDI 3.0 Archiving / (Re)Distributing the data collection… Processing checks Holdings, availability and access conditions

  17. Life Cycle Coverage in DDI 3.0 DDI becomes “visible” to the outside world… DDI Instance: Pulls together all life cycle stages Acquires its own identity as an object Becomes a tool for data discovery and analysis

  18. Life Cycle Coverage in DDI 3.0 Secondary use of data – new conceptual framework… New DDI Instance: New Purpose New Logical Product New Physical Description of Data

  19. DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: • Allows capture and preservation of metadata generated by different agents at different points in time. • Facilitates tracking changes and updates in both data and documentation.

  20. DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: • Enables investigators, data collectors and producers to document their work directly in DDI, thus increasing the metadata’s visibility and usability. • Benefits data users, who need information from the full data life cycle for optimal discovery, evaluation, interpretation, and re-use of data resources.

  21. New / Extended Functionalities in DDI 3.0: Questionnaire Versions 1/2: • No instrument coverage. • Question text only as part of variable description. • No documentation for question flow / conditions. Version 3.0: • Full description of instrument as a separate entity. • Documents specific use of questions: flow, conditions, loops. • Compatible with Computer Assisted Interviewing software.

  22. New / Extended Functionalities in DDI 3.0: Complex Data Versions 1/2: • Inadequate representation of complex / hierarchical data Version 3.0: • Detailed documentation for complex / hierarchical data Logical structure of records Record Types and Relationships Relevant variables: key-link, case identification, record type locator Physical layout of records Single “hierarchical” file for all records, multiple rectangular files, relational database, etc.

  23. New / Extended Functionalities in DDI 3.0: Aggregate Data Versions 1/2: • Initially designed for microdata only • Aggregate data section added in V 2.1 to support limited representation (Census-type data, delimited files) Version 3.0: • Adds support for tabular, spreadsheet-type, representation of aggregate data • Aggregate data transport option: cell content may be included inline with the data item description

  24. New / Extended Functionalities in DDI 3.0: Data Transport Versions 1/2: -None Version 3.0: -In-line inclusion enabled for both aggregate data and microdata

  25. New / Extended Functionalities in DDI 3.0: Longitudinal / Time Series / Cross-national DataComparability Versions 1/2: -None Version 3.0: -Grouping structure documents studies related on one or several dimensions (time, geography, language, etc.) as well as their comparability

  26. New / Extended Functionalities in DDI 3.0:Increased Multilingual Support Versions 1/2: • Limited <anytag xml:lang=“”> Version 3.0: • Support for multiple language use and translations <InternationalStringType xml:lang=“” translated=“” translatable=“”> <Variable> <Label xml:lang=“ger” translated=“false” translatable=“true”> Geburtsjahr</Label> <Label xml:lang=“eng” translated=“true”>Year of Birth</Label> </Variable>

  27. DDI 3.0 Specification: Schema-based Versions 1/2: • DTD-based Version 3.0: • Schema-based: Data typing supports machine actionability Use of namespaces supports • Modularity • Extensibility and reuse • Alignment with / use of other standards

  28. DDI 3.0 Specification: Machine-actionable Versions 1/2: • Machine-readable Version 3.0: • Machine-actionable: 1. Data typing: increased use of controlled vocabularies and standard codes 2. Larger set of required elements Predictable content = a more consistent base for programming

  29. DDI 3.0: Modular Structure Version 1/2: • Single file, hierarchical design Version 3.0: • Modular design: • Facilitates reuse • Facilitates versioning and maintenance • Supports life cycle model • Allows flexibility in organizing the DDI Instance • Supports grouping and comparing studies • Supports creation of metadata registries

  30. DDI 3.0: Alignment with other metadata standards Versions 1/2: • MARC, Dublin Core (bibliographic standards) Version 3.0: • MARC, DC, but also… • SDMX (Statistical Data and Metadata Exchange) • ISO 11179 (Metadata Registries) • FGDC (Digital Geospatial Metadata) - ISO 19115 (Geographic Information Metadata)

  31. DDI 1/2 or DDI 3.0? • DDI 3.0 will not supersede DDI 2.1. • Both versions will • coexist • continue to be maintained • be used according to specific needs. • All DDI 1/2 markup will not have to be migrated to Version 3.0.

  32. DDI 3.0 Structure and Mechanisms

  33. DDI 3.0 – Modular Structure Building blocks of DDI 3.0: • Modules • Schemes

  34. DDI 3.0 – Modular Structure Modules: • Document different aspects of a study, or group of studies, following the data through their life cycle (Conceptual Components, Data Collection, Logical Product, Physical Instance, etc.) Schemes: • Include collections of sibling “objects” that are traditionally components of a variable description: Concepts, Universes, Questions, Variable Labels and Names, Categories, Codes.

  35. DDI 3.0 – Modular Structure Modules: • Can live independently (have their own schemas) or connected to one another within a hierarchical structure. Schemes: • Can live semi-independently (need a higher-level wrapper as they do not have their own schemas) or in-line within a Study Unit or Group module.

  36. DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Module level: DDI Instance Study Unit Group Resource Package Conceptual Components Data Collection Archive Study Unit Subgroup Study Unit (Sub)group Organizations Study Unit Subgroup

  37. DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchy Within modules: Data Collection Methodology Question Scheme Processing Sampling Time Method Question Item Question Item Weighting Coding

  38. DDI 3.0 – Modular Structure Relationships are established through: • In-line inclusion (Relational order is explicit) • Referencing Internal External (Relational order is implicit)

  39. DDI 3.0 – Structural mechanisms Enable modular design and help actualize its benefits. • Inheritance • Referencing • Identification

  40. DDI 3.0: Inheritance • Inheritance is based on the hierarchical structure of the model. • In DDI 3.0 a number of elements are reused at different levels of the hierarchy. • When the same element is present at multiple levels, lower levels inherit content from the upper levels, and only need to specify differences (=local overrides).

  41. DDI 3.0 InheritanceExample • Instance: Coverage: Spatial: 50 US states -Study Unit A – no Spatial Coverage defined = will be inherited from Instance -Study Unit B – Coverage: Spatial: 48 coterminous states = supersedes definition in Instance

  42. DDI 3.0: Referencing • DDI 3.0 modular structure is dependent upon creating relationships by reference. • Referencing implies bringing up the content of a DDI object within, or in association with, another object, by specifying its Unique Identifier. • Identifiers are the key links between DDI objects.

  43. Data Collection Module: Question Scheme: Question: ID: “Q1” Text: “How many days in the past week did you watch the national network news on TV?” Conceptual Components Module: Concept Scheme: Concept: ID: “C1” Description: “Exposure to national TV news” DDI 3.0: ReferencingExample Logical Product Module: Variable Scheme: Variable: ID: “V1” Name: V043014 Label: Days past week watch natl news on TV Question Reference: ID: “Q1” Concept Reference: ID : “C1”

  44. DDI 3.0: ReferencingExample

  45. DDI 3.0: Identification Consistency in building and using identifiers is needed for: • Proper functioning of reference systems, enabling a smooth exchange and reuse of existing metadata. • Machine-actionability of DDI instances, allowing them to serve as a basis for running programs and processes.

  46. DDI 3.0: Identification Element types used in the Identification system:

  47. DDI 3.0: IdentificationElement Types Non-identified elements: • Require context, which is provided by containing parents. Example: codes within code schemes • Are not reusable. Example: variable and category statistics

  48. DDI 3.0: IdentificationElement Types Identifiables • Carry their own ID • May be referenced / reused • Cannot be versioned or maintained, except as part of a complex parent element (Example: Variable – a change implies a new version of the entire scheme).

  49. DDI 3.0: IdentificationElement Types Versionables • Carry their own ID • Carry their own Version: content changes are important to note (Example: Concept – may be independently versioned within a scheme).

  50. DDI 3.0: IdentificationElement Types Maintainables • Are higher level DDI objects • Are both identifiable and versionable • Can also be published and maintained as separate entities (Example: all modules, schemes, comparison maps)

More Related