1 / 29

Combining Metadata Standards: Approaches and Benefits

Combining Metadata Standards: Approaches and Benefits. Arofan Gregory Open Data Foundation. Overview. Recent events of interest The Standards: Comparison and Explanation Emerging Implementation Approaches DDI and SDMX SDMX and the Semantic Web Technologies

shamus
Download Presentation

Combining Metadata Standards: Approaches and Benefits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Metadata Standards: Approaches and Benefits Arofan Gregory Open Data Foundation

  2. Overview • Recent events of interest • The Standards: Comparison and Explanation • Emerging Implementation Approaches • DDI and SDMX • SDMX and the Semantic Web Technologies • Classifications & Multiple Standards • Ideas about Future Work

  3. Recent Events of Interest Note: Some of these events/implementations have been or will be described in detail in other papers – they are only mentioned here. • Schloss Dagstuhl, Germany, November 2009 (DDI 3 Workshop) • SDMX 2.0 – DDI 3 field-level mapping work started • Topic: DDI and the Semantic Web???

  4. Recent Events of Interest (2) • Semantic Web and SDMX • ONS hosted 2-day meeting in the UK, February 2009 (produced draft “SDMX-RDF”) • Banca d’Italia has a prototype project • New project launched at University of Tillburg in the Netherlands (RDF expression of OECD SDMX data) • Australian Bureau of Statistics (ABS) starts looking at SDMX and DDI to support data production lifecycle • Prototype implementations • Some other NSIs also very interested

  5. Recent Events of Interest (3) • Classifications and ISO/IEC 11179 • Australia: Government agencies looking to exchange classifications with ABS from existing ISO/IEC 11179 system, using SDMX, DDI • Statistics Canada: Evaluation of IMDB (ISO/IEC 11179-based metadata repository) for use in coordination with Canadian RDC Network (based on DDI 3)

  6. What Does This Mean? • Not a complete list of events/implementations, but… • Indicates the interest we are seeing in the combined use of standards! • These are not just experiments! • Organizations are looking at implementation in a serious way now

  7. Characterizing the Standards • SDMX: • Data structures and formats • Reference metadata structures and formats • Web-services architecture based on registry services • Content-oriented gudelines • ISO/IEC 11179: • Model for managing concepts and data elements • Metadata registries and lifecycle • ISO 19115: • Standard metadata model for geographies • Used by DDI as geographical model

  8. Characterizing the Standards (2) • Dublin Core: • Citation metadata • Widely used in the Semantic Web • Used natively by DDI for citations • Semantic Web/ “Linked Data” / RDF • See “Open Issues on the Semantic Web” • DDI 3: • Will give more detail, as it is not as familiar to the METIS community…

  9. Characterizing the Standards (3) • DDI 1.*/2.* was a standard used by archives and data libraries • Based on a “codebook” model • Used by some NSIs, especially in the developing world because of the IHSN Metadata Management Toolkit • Used by the European network of data archives, CESSDA • Used by many data archives in North America • Documentation of a single “Study” (survey) • Designed to help researchers find and use microdata • DDI 3 is more ambitious – capture and use of metadata throughout the entire data lifecycle

  10. DDI 3 Lifecycle Model Notice: This is very like a high-level view of the METIS model!

  11. Characterizing the Standards (4) • DDI 3 provides machine-actionable metadata to support “metadata-driven” systems throughout the lifecycle • Focus is on upstream metadata capture and reuse • Describes tabulation/aggregation of microdata • Provides support for comparison across surveys, detailed geography, data processing, register data • Aggregate “NCube” model aligned with SDMX • No architecture/web services support (yet)

  12. An Observation… • It is easy to say that two standards are “aligned” • Many of these standards were intentionally aligned as they were developed • It is much more difficult to understand how to use them in combination effectively…

  13. Approaches and Benefits • SDMX and DDI • DDI microdata production/SDMX aggregate dissemination • Using SDMX data in DDI-based systems (combining aggregates and microdata) • Combined SDMX/DDI supporting the entire data lifecycle • DDI register data reported to SDMX collection system • SDMX and the Semantic Web • Classifications and the Standards

  14. DDI 3 Metadata Surveys Input data Dissemination data Registers Cleaning, editing, estimation, aggregation, etc. Website/Web Service SDMX-ML Data, Metadata, Structure

  15. DDI – SDMX: Benefits • The benefits of this approach are those found by using the standards generally • Supports “metadata-driven” system for data production throughout the lifecycle (DDI) • Metadata-rich dissemination format, preferred by data collectors (SDMX) • Shared tools; SDMX registry services, Web Services for discovery and use of aggregates

  16. SDMX – DDI: Integrating Aggregates and Microdata • Scenario is common in some research • Economic data is often only available as aggregates • Challenge is to combine aggregates and other microdata

  17. SDMX Web Service SDMX-to-DDI 3 Transform Data archive/ repository Surveys (DDI 3) Processing to produce Integrated data and Metadata (DDI 3) Registers (DDI 3)

  18. SDMX – DDI: Benefits • Allows for easy use of official statistics by researchers • Solves problems of combining aggregates and microdata • Note: This does not involve dis-aggregation of published data • Structural transformation only, to allow DDI 3 systems to process aggregates easily

  19. DDI + SDMX: The Data Lifecycle • Uses a metadata model capable of expression as either SDMX or DDI, depending • Provides support for process management • Uses many features of SDMX (process model, structure sets, reporting taxonomies, etc.) • Uses SDMX architecture/services model • Designed to allow incorporation of other standards

  20. Process-management system (BPML) All registry interactions use SDMX (SDMX) Dissemination data store Input data store SDMX Registry Surveys (DDI 3) Web site/ Print/ Web Services Registers (DDI 3) Interactions between systems are DDI or SDMX Web Services, as appropriate (SDMX, DDI, etc.) Data and metadata repositories/ application databases

  21. SDMX + DDI: Benefits • Leverages Web-Services technologies (registry, event triggers, etc.) for efficient automation, migration, flexibility • Choice of tools is broad • Use the “best” format for any given task • All the benefits of DDI-SDMX case • Good support for process management as well as data management

  22. SDMX and the Semantic Web Technologies • Potentially applies to other standards as well (DDI, ISO/IEC 11179, etc.) • Note that Semantic Web technologies only apply to dissemination • Not designed to support data production • Terms: • “Raw data” in an SW context does not mean “raw data” • “Data” in an SW context means “anything that can be described using RDF” – not numeric data

  23. Assumptions • Creation of a harmonized statistical model based on proven models/standards, but expressed as RDF (“ontology” or “vocabulary” in SW terms) • Implementation of an “SDMX-RDF” in standard SDMX dissemination packages

  24. Internal (production environment) External (dissemination to Web) Triplestore (SDMX- RDF) “SDMX-RDF” Transform (SPARQL Queries) (RDF) (SDMX-driven production system) SDMX Web Service (SDMX-ML) Dissemination data store (SDMX)

  25. SDMX and the Semantic Web: Benefits • Leverages the “Linked Data” phenomenon without requiring a deep understanding of RDF, etc. • Uses existing standards/models and best practices to do “heavy lifting” (data production) • Puts a lot of reliable, quality data into the “Linked Data Web” • Helps address issues of provenance

  26. Warning • RDF is verbose! • 4.5 Megs of GESMES/TS = 45 Megs of “compact” SDMX-ML XML = 420 Megs of RDF triples • This may encourage the on-demand production of RDF data from web services, rather than static files

  27. Standards and Classifications • Some maintainers of standard classifications are looking at expressing them in useful formats (SDMX, DDI) • This is an easy thing to do • It is very useful: promotes re-use, comparability, etc. • Could apply to Semantic Web RDF expressions as well as XML-based standards

  28. Ideas for Future Work • Endorse SDMX – DDI mappings now being produced • Develop an “SDMX-RDF” (?) or… • Develop a harmonized statistical model for expression in RDF (based on DDI, SDMX, ISO/IEC 11179) (?) • Encourage tools developers to implement it in standard dissemination packages • Publish standard classifications in standard formats

  29. Summary • Combined use of standards is becoming a reality • Proactive engagement with the Semantic Web world could provide benefits to all concerned parties, as well as users

More Related