1 / 143

EDDI: Introduction to SDMX

EDDI: Introduction to SDMX. Arofan Gregory Open Data Foundation. What is SDMX?. The problem space: Statistical collection, processing, and exchange is time-consuming and resource-intensive Various international and national organisations have individual approaches for their constituencies

tsmothers
Download Presentation

EDDI: Introduction to SDMX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation

  2. What is SDMX? The problem space: Statistical collection, processing, and exchange is time-consuming and resource-intensive Various international and national organisations have individual approaches for their constituencies Uncertainties about how to proceed with new technologies (XML, web services …)

  3. International OrganisationsRegional Organisations accountsstatistics National Statistical Organisations accountsstatistics Banks, Corporates Individual Households trans-actionsaccounts www.z.orgwww.hub.org 180 + Countries Internet, Search, Navigation www.y.org www.x.org

  4. What is SDMX? The Statistical Data and Metadata Exchange (SDMX) initiative is taking steps to address these challenges and opportunities that have just been mentioned: By focusing on business practices in the field of statistical information By identifying more efficient processes for exchange and sharing of data and metadata using modern technology

  5. Historical Note SDMX uses an approach based on the 10-year-long success of an earlier standard – GESMES/TS GESMES/TS was an initiative that is used today in many countries for collecting, exchanging, and updating statistical databases GESMES/TS is now SDMX-EDI Focus is on time-series, and is mostly used by central banks

  6. Who is SDMX? SDMX is an initiative made up of seven international organizations: Bank for International Settlements European Central Bank Eurostat International Monetary Fund Organisation for Economic Cooperation and Development United Nations World Bank The initiative was launched in 2002

  7. SDMX Products Technical standards for the formatting and exchange of aggregate statistics: SDMX Technical Specifications version 1.0 (now ISO/TS 17369 SDMX) SDMX Technical Specifications version 2.0 (submitted to ISO) SDMX Technical Specifications version 2.1 under review (will be forwarded to ISO) Content-Oriented Guidelines Common Metadata Vocabulary Cross-Domain Statistical Concepts Statistical Subject-Matter Domains

  8. Detailed SDMX Goals Reduce national reporting burden to international institutions Fostering consistency, accuracy, and timeliness between data and metadata disseminated by national and international institutions, relying on what is decentrally released via national websites Enhancing national statistical processing efficiency, especially through internationally-recognised standard formats for exchanges between statistical silos within institutions and with other national statistical agencies Providing standards for web-based dissemination formats that are computer readable and facilitate updating of databases Enhancing comparison of data and metadata analysis through standard formats and content-oriented guidelines

  9. Official Recommendations SDMX has been officially recommended: February 2007: SDMX endorsed by the European Union’s Statistical Programme Committee March 2008: UN Statistical Commission declares SDMX to be the preferred standard for data and metadata

  10. Exchange Patterns Bilateral: Institutions exchange data according to bilateral agreements regarding format, timing, protocols, etc. Gateway: Institutions share the data they collect with their peers, in agreed formats among counterparty communities Data-sharing: standard exchange of data using standard formats and protocols

  11. Bilateral Exchange

  12. Gateway Exchange

  13. Data-Sharing Exchange

  14. Notes About Data-Sharing Data-sharing only works if there are standard formats Data-sharing works only if the data themselves are decentralized One big database doesn’t work! Like the Web itself, a data-sharing model relies on pull exchanges, not push exchanges Data consumers discover the data they need, and its location, and then go and get it Data producers don’t have to send data

  15. SDMX View SDMX products support all types of exchange One major requirement is to work well with existing systems, to protect technology investments SDMX promotes an incremental movement toward the data-sharing model

  16. Exchange with Peer Organizations SDMX-EDI and SDMX-ML are both able to exchange databases between peer organizations Structural metadata is also exchanged and can be read by counterparty systems Incremental updating is possible Increases degree of automation for exchange – lowers degree of bilateral, verbal agreement Can use “pull” instead of “push” if registry is deployed

  17. Integration within an Organization SDMX standard formats are also useful within an organization Many organizations have several disparate databases Differences in database structure and content can make it difficult to use other system’s data SDMX-ML provides a way to loosely couple such databases, while facilitating exchange An SDMX registry can allow visibility into other databases, while not affecting control or ownership of data

  18. Data Collection and Warehousing When data is collected from many different sources, it can be in a wide variety of formats Typically metadata-poor SDMX allows for a single, metadata-rich reporting format for each type of data Existing counterparty systems can be “wrappered” to support SDMX for exchange only

  19. Adoption of SDMX SDMX has been aggressively adopted, as compared to other international technology standards Many important data sets are available in SDMX-ML today There are many prototypes and planned projects at the national and international level Increasing numbers of tools are available which support SDMX

  20. Adopters/Interest The following are known adopters (or planning to adopt): US Federal Reserve Board and Bank of New York European Central Bank Joint External Debt Hub (WB, IMF, OECD, BIS) UN/TRADECOM at UN Statistical Division NAAWE (National Accounts from OECD/Eurostat) European Statistical System (Eurostat and National Statistical Institutes) Mexican Federal System Vietnamese Ministry of Planning and Investment Qatar Information Exchange IMF (BOP, SNA, SDDS/GDDS) Food and Agriculture Organization Millennium Development Goals (UN System, others) International Labor Organization Bank for International Settlements OECD World Bank World Development Indicators (WDI) Marchioness Islands (Spanish/Portuguese Statistical Region) UNESCO (Education) Australian Bureau of Statistics WHO (SDMX-HD) Statistics Canada There are many others!

  21. SDMX and Domains • SDMX is organized as a central standard, created and supported by the SDMX Initiative • Each statistical domain creates it’s own domain standard • Example: WHO has created SDMX-HD (“Health Domain”) for monitoring disease outbreaks/epidemiology • Example: UNESCO and Eurostat have developed standard SDMX applications for Education Statistics • You should look at the work in the different domains when applying SDMX to different national-level statistics collection

  22. US Federal Reserve Board Several important data sets are available – and searchable at a granular level – using SDMX SDMX-ML is both a web-delivery format and an internal exchange format for production of data http://www.federalreserve.gov/datadownload/ default.htm

  23. Federal Reserve Bank of New York Historical data – once stored in huge CSV files – is now available as SDMX-ML Increased the use of the site The “typical user” is now a machine http://www.newyorkfed.org/xml/index.html

  24. European Central Bank ECB uses SDMX-EDI to exchange data with European Central Banks SDMX-ML is used for web dissemination Simultaneous release on many CB sites Each site can use its own language and look & feel Data warehouse now available in SDMX-ML Built and maintained using SDMX standards http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html http://stats.ecb.europa.eu/stats/sdmx/visualisation/icp/dashboard/rc1/ ECB’s Statistical Data Warehouse/web service

  25. OECD Data structures are specified using SDMX standards Data sets are held in SDMX-ML format and navigated “on the fly” OECD.Stat http://stats.oecd.org/WBOS/index.aspx Experimenting with graphical presentation of data Serves all OECD data as SDMX through OECD.stat web service

  26. Eurostat Builds on long experience of using GESMES for data transmission (GESMES is main format for transmission of data in several important domains e.g. national accounts, balance of payments, short-term statistics) More than 50 Data Structure Definitions for GESMES developed and maintained (in partnership with ECB) Software components developed and made available as open-source software (see Tools page of SDMX website) Now creating a portal for all European Census data, collected as SDMX

  27. SDMX Specifications and Products

  28. SDMX Information Model: High level Schematic Category Scheme Data or Metadata Structure Definition comprises subject or reporting categories uses specific data/metadata structure can be linked to categories in multiple category schemes conforms to business rules of the data/metadata flow Data or Metadata Flow Data or Metadata Set Category can get data/metadata from multiple data/metadata providers publishes/reports data/metadata sets can have child categories can provide data/metadata for many data/metadata flows using agreed data/metadata structure Registered Data or Metadata Set Provision Agreement is registered for Data Provider registers existence of data and metadata

  29. SDMX Technical Specs v 1.0 Information Model (data structure definitions and data formats) SDMX-ML: XML formats for data structure definitions and data SDMX-EDI: EDI formats for data structure definitions and data Web-Services Guidelines User Guide

  30. Technical Notes on Version 1.0 Only numeric observations were supported Only coded key values were supported Intended to provide an XML version of the existing GESMES/TS data model GESMES/TS became SDMX-EDI XML extended the data model to provide for more types of groups and cross-sectional data Hierarchical codelists not supported

  31. SDMX Technical Spec v. 2.0 Expanded data model includes Registry interfaces Metadata structures and formats Data and metadata provisioning Other advanced features (process flow, reporting taxonomy, structure mapping, etc.) Data formats now include uncoded dimensions, hierarchical codelists, and non-numeric observations

  32. Technical Notes on Version 2.0 A very large expansion of scope Model covers the process of statistical exchange, not just the data formats Many cases which version 1.0 could not support were included in version 2.0 as a result of implementations Full support for the “data sharing” pattern of exchange Resulting from the inclusion of the registry

  33. Changes for Version 2.1 • Expanded Web Services Guidelines • Standard WSDL Functions • Standard RESTful syntax (URL-based API) • Standard Error Codes • Will allow for interoperable web services for SDMX – so generic clients can use multiple sources • Simplified Data Formats • All data formats will be more consistent • Cross-sectional and time-series formats are more similar • SDMX Query has been improved • Note: SDMX 2.1 is available for public review now!

  34. SDMX Content-Oriented Guidelines Four documents: Overview Metadata Common Vocabulary Cross-Domain Concepts Statistical Subject-Matter Domains These will not become ISO specifications, but will evolve as publications of the SDMX Initiative

  35. Metadata Common Vocabulary A set of terms and definitions for the different parts of the SDMX technical standards, and many common concepts used in data and metadata structures Does not replace other major vocabularies in this space (such as the OECD glossary) but references these other works

  36. Cross-Domain Concepts Includes concepts which are common across many statistical domains Names & Definitions Representations These are concepts which support both data and metadata structures

  37. Statistical Subject-Matter Domains Based on the UN/ECE classification of statistical activities Provides a classification system for use in exchanging statistics across domain boundaries Provides a breakdown of the various domains within official statistics

  38. SDMX and Data Formats

  39. Data Set

  40. We have a dataset, what do we need to know? Version 1.0 What it is and how it is structured Version 2.0 Who reports/disseminates it How a specific data set fits into the overall collection framework and which organisation is responsible for reporting which parts The reporting/publication schedule That it has been reported/published

  41. Data Set: Structure

  42. First: Identify the Concepts A concept is a unit of knowledge created by a unique combination of characteristics (SDMX Information Model)

  43. Data Set Structure:Concepts Stock/Flow Country Unit Multiplier Unit Time/Frequency Topic • Computers need structure of data • Concepts • Code lists • Data values • How these fit together

  44. Data Set Structure: Code Lists TOPIC COUNTRY STOCK/FLOW A Brady Bonds B Bank Loans C Debt Securities AR Argentina MX Mexico ZA South Africa 1 Stock 2 Flow CONCEPTS Topic Country Flow Concepts Code Lists

  45. Data Makes Sense 16457 Q,ZA,B,1,1999-06-30=16547

  46. Data Set Structure: Defining Multi-dimensional Structures • Comprises • Concepts that identify the observation value • Concepts that add additional metadata about the observation value • Concept that is the observation value • Any of these may be • coded • text • date/time • number • etc. Dimensions Attributes Measure Representation

  47. Stock/Flow Country Unit Multiplier Unit Time/Frequency Topic Observation Data Set Structure: Concept Usage (Dimension) (Dimension) (Attribute) (Attribute) (Dimension) (Dimension) (Dimension) (Measure)

  48. CONCEPTS Topic Country Flow Data Structure Definition concepts that identify groups of keys concepts that identify the observation Key Group Key concepts that are observed phenomenon concepts that add metadata Attributes Measures Dimensions has format takes semantic from has format takes semantic from takes semantic from Representation Non-coded Concept Coded has code list has format TOPIC A Brady Bonds B Bank Loans C Debt Securities Code List

  49. Data Makes Sense 16457 Frequency,Country,Topic,Stock/Flow,Time=Observation Q,ZA,B,1,1999-06-30=16547 Quarterly, South Africa, Bank Loans, Stocks, 2nd quarter 1999

  50. Identifying Concepts Identifying Concepts - Sources Existing data set tables From website From applications Data Collection Instruments Questionnaires Excel spreadsheets Regulations, Handbooks, User Guides Labour Statistics Convention, 1985 (No. 160), Recommendation, 1985 (No. 170) Council Regulation No: 311/76/EEC of 09/021976; OJ: L039 of 14/02/1976; Compilation of statistics on foreign workers Database Tables Existing Data Structure Definitions From other organisations

More Related