1 / 28

WP.5 - DDI-SDMX Integration

METIS Work Session. 6-8 May 2013. WP.5 - DDI-SDMX Integration. E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat. Outline. ESS VIP programme Cross-cutting project on Information Models and Standards

dorit
Download Presentation

WP.5 - DDI-SDMX Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. METIS Work Session 6-8 May 2013 WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat

  2. Outline • ESS VIP programme • Cross-cutting project on Information Models and Standards • SDMX-DDI Integration: open points for discussion

  3. ESS.VIP programme Transformation programme for the modernisation of the production systems in the European Statistical System (ESS) through: • moving towards more common solutions and shared services and environment • economies of scale and efficiency gains, sharing costs

  4. ESS.VIP business and information principles • Maximum reuse of existing process components and segments • Metadata driven processes allowing adaptation and extension to other contexts • New business process built as a sequence of modular process steps / services • Information objects structured according to available information models and stored in corporate registries/repositories in view of reuse • Adherence to industry and open standards as available (e.g. Plug & Play)

  5. ESS.VIP Programme components

  6. Information Models Standards Objectives: • To ensure that ESS.VIP have access to a set of agreed-upon standards supporting the modernisation of statistical production processes. • To increase coherence between standards, at the same time ensuring that these are consistent with best practices and recommendations from the international community. • To define information models that can be used across the ESS to model structural metadata for micro-data and aggregated data. • To set up guidelines for designing and documenting business processes. • To provide support mechanisms (e.g., capacity-building and training).

  7. Which standards and models? • Re-use existing resources • Link to new initiatives(e.g. Sponsorship on Standardisation, GSIM)

  8. The SDMX-DDI approach • Informal meetings (2010-2013) between members of SDMX and DDI communities • Initiative of the SDMX Secretariat through its Technical Working Group • Approach to using SDMX and DDI interchangeablyNow, we are at the stage where implementations are being investigated and prototyped • Not “if”, but “how” • Most often, this is done in the context of the Generic Statistical Business Process Model (GSBPM) • Idea of “industrialised” statistical production • Strong emphasis on process management

  9. Generic Statistical Business Process Model DDI DDI SDMX

  10. GSBPM, DDI and SDMX: towards a complete system? SDMX SDMX DDI DDI SDMX

  11. Characterizing the Standards: DDI • DDI Lifecycle can provide a very detailed set of metadata, covering: • The study or series of studies • Many aspects of data collection, including surveys and processing of microdata • The structure of data files, including hierarchical files and those with complex relationships • The lifecycle events and archiving of data files and their metadata • The tabulation and processing of data into tables (Ncubes) • It allows for a link between microdata variables and the resulting aggregates

  12. Characterizing the Standards: SDMX • Describes the structure of aggregate/dimensional data (“structural metadata”) • Provides formats for the dimensional data • Provides a model of data reporting and dissemination • Provides a way of describing and formatting stand-alone metadata sets (“reference metadata”) • Provides standard registry interfaces, providing a catalogue of resources • Provides guidelines for deploying standard web services for SDMX resources • Provides a way of describing statistical processes

  13. SDMX Data validation and editing, SDMX Registry, DSD and data set, MSD, metadata set, Web services Process Metadata

  14. DDI and SDMX • DDI offers a very rich model for the documentation of micro-data • SDMX offers a very integrated exchangeplatform for statistical outputs (IT architectures, tools, web services) • The combined use of both standards could allow a higher level of integration of the complete production process • But: The devil is in the detail!

  15. Analysis of use cases • The SDMX TWG has been defining a set of relevant use cases where the two standards could be compared and, if possible, used together: • Survey data collection • Administrative and register data • Combined use of DDI and SDMX • Micro-data access and on-demand tabulation of micro-data • Metadata and quality reporting

  16. Survey A Survey is targeted at a specific Population and comprises Questions Questions may be linked to a Variable which specifies - conceptual meaning (Concept) valid set of responses that are allowed (Category Scheme and contained Category) Output from the Survey is a Unit Record Data Set

  17. The Proposed Approach • The full set of information includes: • The unit record data • Structural information about the variables and representations • Additional information about how the data has been generated/collected/processed • In DDI, this set of information can be expressed as a DDI instance and a data file • Both the structural and processing metadata can be expressed as a single DDI instance

  18. Data Process and Cleaning Editing process can include Validation Outlier Trimming Recodes Editing for Non Response Editing process consists of Description of the process (Process Description) Software environment (Executable Code)

  19. Tabulation The result of a Tabulation is an Aggregate Data Set Structured according to a Dimensional Structure Definition (SDMX DSD) Comprising Dimensions, Attributes, Measures Each take their semantic and representation from a Variable Data Set comprises statistical series Key Attributes Observations

  20. Output Tables

  21. Concepts

  22. Metadata Set Unit Record Data DDI Instance ASCII Data File SDMX Data Set SDMX Structural Metadata SDMX Metadata Report

  23. The challenge • Is not about which flavor of XML we use (XML doesn’t really matter) • It’s about data and metadata! • If I want to use DDI to describe my data, and you want to use SDMX, how can we ensure that we are getting the same data and metadata?

  24. The challenge (2) • If I am using SDMX, but I am sent DDI, a simple transformation must give me the same payload of data and metadata • Vice-versa for SDMX users • Conventions will need to be established regarding identifiers and the way the unit record files are structured • There will need to be agreed models for each business case

  25. Combined DDI-SDMX approaches • Mixing the two standards within an implementation, allowing for the expression of the same metadata in both standards, so that the information could be transformed from one format to the other. • This way, it would become possible to select either DDI or SDMX for a particular operation, similar to what we discussed above regarding application functionality. • Metadata stored and indexed in such a fashion that it can be expressed either as SDMX or DDI on an as-needed basis. • Metadata Repository and Registry project at ABS. • The actual format used for metadata storage may be neither SDMX nor DDI, so long as it can be expressed using both standards. • GSIM to be implemented through a combination of SDMX and DDI?

  26. Generic Statistical Information Model (GSIM) SDMX DDI ISO 11179 Etc.

  27. Feedback is welcome Thank you! Marco Pellegrino Denis Grofils

More Related