290 likes | 448 Views
Combining Metadata Standards: Approaches and Benefits. Arofan Gregory Open Data Foundation. Overview. Recent events of interest The Standards: Comparison and Explanation Emerging Implementation Approaches DDI and SDMX SDMX and the Semantic Web Technologies
E N D
Combining Metadata Standards: Approaches and Benefits Arofan Gregory Open Data Foundation
Overview • Recent events of interest • The Standards: Comparison and Explanation • Emerging Implementation Approaches • DDI and SDMX • SDMX and the Semantic Web Technologies • Classifications & Multiple Standards • Ideas about Future Work
Recent Events of Interest Note: Some of these events/implementations have been or will be described in detail in other papers – they are only mentioned here. • Schloss Dagstuhl, Germany, November 2009 (DDI 3 Workshop) • SDMX 2.0 – DDI 3 field-level mapping work started • Topic: DDI and the Semantic Web???
Recent Events of Interest (2) • Semantic Web and SDMX • ONS hosted 2-day meeting in the UK, February 2009 (produced draft “SDMX-RDF”) • Banca d’Italia has a prototype project • New project launched at University of Tillburg in the Netherlands (RDF expression of OECD SDMX data) • Australian Bureau of Statistics (ABS) starts looking at SDMX and DDI to support data production lifecycle • Prototype implementations • Some other NSIs also very interested
Recent Events of Interest (3) • Classifications and ISO/IEC 11179 • Australia: Government agencies looking to exchange classifications with ABS from existing ISO/IEC 11179 system, using SDMX, DDI • Statistics Canada: Evaluation of IMDB (ISO/IEC 11179-based metadata repository) for use in coordination with Canadian RDC Network (based on DDI 3)
What Does This Mean? • Not a complete list of events/implementations, but… • Indicates the interest we are seeing in the combined use of standards! • These are not just experiments! • Organizations are looking at implementation in a serious way now
Characterizing the Standards • SDMX: • Data structures and formats • Reference metadata structures and formats • Web-services architecture based on registry services • Content-oriented gudelines • ISO/IEC 11179: • Model for managing concepts and data elements • Metadata registries and lifecycle • ISO 19115: • Standard metadata model for geographies • Used by DDI as geographical model
Characterizing the Standards (2) • Dublin Core: • Citation metadata • Widely used in the Semantic Web • Used natively by DDI for citations • Semantic Web/ “Linked Data” / RDF • See “Open Issues on the Semantic Web” • DDI 3: • Will give more detail, as it is not as familiar to the METIS community…
Characterizing the Standards (3) • DDI 1.*/2.* was a standard used by archives and data libraries • Based on a “codebook” model • Used by some NSIs, especially in the developing world because of the IHSN Metadata Management Toolkit • Used by the European network of data archives, CESSDA • Used by many data archives in North America • Documentation of a single “Study” (survey) • Designed to help researchers find and use microdata • DDI 3 is more ambitious – capture and use of metadata throughout the entire data lifecycle
DDI 3 Lifecycle Model Notice: This is very like a high-level view of the METIS model!
Characterizing the Standards (4) • DDI 3 provides machine-actionable metadata to support “metadata-driven” systems throughout the lifecycle • Focus is on upstream metadata capture and reuse • Describes tabulation/aggregation of microdata • Provides support for comparison across surveys, detailed geography, data processing, register data • Aggregate “NCube” model aligned with SDMX • No architecture/web services support (yet)
An Observation… • It is easy to say that two standards are “aligned” • Many of these standards were intentionally aligned as they were developed • It is much more difficult to understand how to use them in combination effectively…
Approaches and Benefits • SDMX and DDI • DDI microdata production/SDMX aggregate dissemination • Using SDMX data in DDI-based systems (combining aggregates and microdata) • Combined SDMX/DDI supporting the entire data lifecycle • DDI register data reported to SDMX collection system • SDMX and the Semantic Web • Classifications and the Standards
DDI 3 Metadata Surveys Input data Dissemination data Registers Cleaning, editing, estimation, aggregation, etc. Website/Web Service SDMX-ML Data, Metadata, Structure
DDI – SDMX: Benefits • The benefits of this approach are those found by using the standards generally • Supports “metadata-driven” system for data production throughout the lifecycle (DDI) • Metadata-rich dissemination format, preferred by data collectors (SDMX) • Shared tools; SDMX registry services, Web Services for discovery and use of aggregates
SDMX – DDI: Integrating Aggregates and Microdata • Scenario is common in some research • Economic data is often only available as aggregates • Challenge is to combine aggregates and other microdata
SDMX Web Service SDMX-to-DDI 3 Transform Data archive/ repository Surveys (DDI 3) Processing to produce Integrated data and Metadata (DDI 3) Registers (DDI 3)
SDMX – DDI: Benefits • Allows for easy use of official statistics by researchers • Solves problems of combining aggregates and microdata • Note: This does not involve dis-aggregation of published data • Structural transformation only, to allow DDI 3 systems to process aggregates easily
DDI + SDMX: The Data Lifecycle • Uses a metadata model capable of expression as either SDMX or DDI, depending • Provides support for process management • Uses many features of SDMX (process model, structure sets, reporting taxonomies, etc.) • Uses SDMX architecture/services model • Designed to allow incorporation of other standards
Process-management system (BPML) All registry interactions use SDMX (SDMX) Dissemination data store Input data store SDMX Registry Surveys (DDI 3) Web site/ Print/ Web Services Registers (DDI 3) Interactions between systems are DDI or SDMX Web Services, as appropriate (SDMX, DDI, etc.) Data and metadata repositories/ application databases
SDMX + DDI: Benefits • Leverages Web-Services technologies (registry, event triggers, etc.) for efficient automation, migration, flexibility • Choice of tools is broad • Use the “best” format for any given task • All the benefits of DDI-SDMX case • Good support for process management as well as data management
SDMX and the Semantic Web Technologies • Potentially applies to other standards as well (DDI, ISO/IEC 11179, etc.) • Note that Semantic Web technologies only apply to dissemination • Not designed to support data production • Terms: • “Raw data” in an SW context does not mean “raw data” • “Data” in an SW context means “anything that can be described using RDF” – not numeric data
Assumptions • Creation of a harmonized statistical model based on proven models/standards, but expressed as RDF (“ontology” or “vocabulary” in SW terms) • Implementation of an “SDMX-RDF” in standard SDMX dissemination packages
Internal (production environment) External (dissemination to Web) Triplestore (SDMX- RDF) “SDMX-RDF” Transform (SPARQL Queries) (RDF) (SDMX-driven production system) SDMX Web Service (SDMX-ML) Dissemination data store (SDMX)
SDMX and the Semantic Web: Benefits • Leverages the “Linked Data” phenomenon without requiring a deep understanding of RDF, etc. • Uses existing standards/models and best practices to do “heavy lifting” (data production) • Puts a lot of reliable, quality data into the “Linked Data Web” • Helps address issues of provenance
Warning • RDF is verbose! • 4.5 Megs of GESMES/TS = 45 Megs of “compact” SDMX-ML XML = 420 Megs of RDF triples • This may encourage the on-demand production of RDF data from web services, rather than static files
Standards and Classifications • Some maintainers of standard classifications are looking at expressing them in useful formats (SDMX, DDI) • This is an easy thing to do • It is very useful: promotes re-use, comparability, etc. • Could apply to Semantic Web RDF expressions as well as XML-based standards
Ideas for Future Work • Endorse SDMX – DDI mappings now being produced • Develop an “SDMX-RDF” (?) or… • Develop a harmonized statistical model for expression in RDF (based on DDI, SDMX, ISO/IEC 11179) (?) • Encourage tools developers to implement it in standard dissemination packages • Publish standard classifications in standard formats
Summary • Combined use of standards is becoming a reality • Proactive engagement with the Semantic Web world could provide benefits to all concerned parties, as well as users