1.43k likes | 1.55k Views
EDDI: Introduction to SDMX. Arofan Gregory Open Data Foundation. What is SDMX?. The problem space: Statistical collection, processing, and exchange is time-consuming and resource-intensive Various international and national organisations have individual approaches for their constituencies
E N D
EDDI: Introduction to SDMX Arofan Gregory Open Data Foundation
What is SDMX? The problem space: Statistical collection, processing, and exchange is time-consuming and resource-intensive Various international and national organisations have individual approaches for their constituencies Uncertainties about how to proceed with new technologies (XML, web services …)
International OrganisationsRegional Organisations accountsstatistics National Statistical Organisations accountsstatistics Banks, Corporates Individual Households trans-actionsaccounts www.z.orgwww.hub.org 180 + Countries Internet, Search, Navigation www.y.org www.x.org
What is SDMX? The Statistical Data and Metadata Exchange (SDMX) initiative is taking steps to address these challenges and opportunities that have just been mentioned: By focusing on business practices in the field of statistical information By identifying more efficient processes for exchange and sharing of data and metadata using modern technology
Historical Note SDMX uses an approach based on the 10-year-long success of an earlier standard – GESMES/TS GESMES/TS was an initiative that is used today in many countries for collecting, exchanging, and updating statistical databases GESMES/TS is now SDMX-EDI Focus is on time-series, and is mostly used by central banks
Who is SDMX? SDMX is an initiative made up of seven international organizations: Bank for International Settlements European Central Bank Eurostat International Monetary Fund Organisation for Economic Cooperation and Development United Nations World Bank The initiative was launched in 2002
SDMX Products Technical standards for the formatting and exchange of aggregate statistics: SDMX Technical Specifications version 1.0 (now ISO/TS 17369 SDMX) SDMX Technical Specifications version 2.0 (submitted to ISO) SDMX Technical Specifications version 2.1 under review (will be forwarded to ISO) Content-Oriented Guidelines Common Metadata Vocabulary Cross-Domain Statistical Concepts Statistical Subject-Matter Domains
Detailed SDMX Goals Reduce national reporting burden to international institutions Fostering consistency, accuracy, and timeliness between data and metadata disseminated by national and international institutions, relying on what is decentrally released via national websites Enhancing national statistical processing efficiency, especially through internationally-recognised standard formats for exchanges between statistical silos within institutions and with other national statistical agencies Providing standards for web-based dissemination formats that are computer readable and facilitate updating of databases Enhancing comparison of data and metadata analysis through standard formats and content-oriented guidelines
Official Recommendations SDMX has been officially recommended: February 2007: SDMX endorsed by the European Union’s Statistical Programme Committee March 2008: UN Statistical Commission declares SDMX to be the preferred standard for data and metadata
Exchange Patterns Bilateral: Institutions exchange data according to bilateral agreements regarding format, timing, protocols, etc. Gateway: Institutions share the data they collect with their peers, in agreed formats among counterparty communities Data-sharing: standard exchange of data using standard formats and protocols
Notes About Data-Sharing Data-sharing only works if there are standard formats Data-sharing works only if the data themselves are decentralized One big database doesn’t work! Like the Web itself, a data-sharing model relies on pull exchanges, not push exchanges Data consumers discover the data they need, and its location, and then go and get it Data producers don’t have to send data
SDMX View SDMX products support all types of exchange One major requirement is to work well with existing systems, to protect technology investments SDMX promotes an incremental movement toward the data-sharing model
Exchange with Peer Organizations SDMX-EDI and SDMX-ML are both able to exchange databases between peer organizations Structural metadata is also exchanged and can be read by counterparty systems Incremental updating is possible Increases degree of automation for exchange – lowers degree of bilateral, verbal agreement Can use “pull” instead of “push” if registry is deployed
Integration within an Organization SDMX standard formats are also useful within an organization Many organizations have several disparate databases Differences in database structure and content can make it difficult to use other system’s data SDMX-ML provides a way to loosely couple such databases, while facilitating exchange An SDMX registry can allow visibility into other databases, while not affecting control or ownership of data
Data Collection and Warehousing When data is collected from many different sources, it can be in a wide variety of formats Typically metadata-poor SDMX allows for a single, metadata-rich reporting format for each type of data Existing counterparty systems can be “wrappered” to support SDMX for exchange only
Adoption of SDMX SDMX has been aggressively adopted, as compared to other international technology standards Many important data sets are available in SDMX-ML today There are many prototypes and planned projects at the national and international level Increasing numbers of tools are available which support SDMX
Adopters/Interest The following are known adopters (or planning to adopt): US Federal Reserve Board and Bank of New York European Central Bank Joint External Debt Hub (WB, IMF, OECD, BIS) UN/TRADECOM at UN Statistical Division NAAWE (National Accounts from OECD/Eurostat) European Statistical System (Eurostat and National Statistical Institutes) Mexican Federal System Vietnamese Ministry of Planning and Investment Qatar Information Exchange IMF (BOP, SNA, SDDS/GDDS) Food and Agriculture Organization Millennium Development Goals (UN System, others) International Labor Organization Bank for International Settlements OECD World Bank World Development Indicators (WDI) Marchioness Islands (Spanish/Portuguese Statistical Region) UNESCO (Education) Australian Bureau of Statistics WHO (SDMX-HD) Statistics Canada There are many others!
SDMX and Domains • SDMX is organized as a central standard, created and supported by the SDMX Initiative • Each statistical domain creates it’s own domain standard • Example: WHO has created SDMX-HD (“Health Domain”) for monitoring disease outbreaks/epidemiology • Example: UNESCO and Eurostat have developed standard SDMX applications for Education Statistics • You should look at the work in the different domains when applying SDMX to different national-level statistics collection
US Federal Reserve Board Several important data sets are available – and searchable at a granular level – using SDMX SDMX-ML is both a web-delivery format and an internal exchange format for production of data http://www.federalreserve.gov/datadownload/ default.htm
Federal Reserve Bank of New York Historical data – once stored in huge CSV files – is now available as SDMX-ML Increased the use of the site The “typical user” is now a machine http://www.newyorkfed.org/xml/index.html
European Central Bank ECB uses SDMX-EDI to exchange data with European Central Banks SDMX-ML is used for web dissemination Simultaneous release on many CB sites Each site can use its own language and look & feel Data warehouse now available in SDMX-ML Built and maintained using SDMX standards http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html http://stats.ecb.europa.eu/stats/sdmx/visualisation/icp/dashboard/rc1/ ECB’s Statistical Data Warehouse/web service
OECD Data structures are specified using SDMX standards Data sets are held in SDMX-ML format and navigated “on the fly” OECD.Stat http://stats.oecd.org/WBOS/index.aspx Experimenting with graphical presentation of data Serves all OECD data as SDMX through OECD.stat web service
Eurostat Builds on long experience of using GESMES for data transmission (GESMES is main format for transmission of data in several important domains e.g. national accounts, balance of payments, short-term statistics) More than 50 Data Structure Definitions for GESMES developed and maintained (in partnership with ECB) Software components developed and made available as open-source software (see Tools page of SDMX website) Now creating a portal for all European Census data, collected as SDMX
SDMX Information Model: High level Schematic Category Scheme Data or Metadata Structure Definition comprises subject or reporting categories uses specific data/metadata structure can be linked to categories in multiple category schemes conforms to business rules of the data/metadata flow Data or Metadata Flow Data or Metadata Set Category can get data/metadata from multiple data/metadata providers publishes/reports data/metadata sets can have child categories can provide data/metadata for many data/metadata flows using agreed data/metadata structure Registered Data or Metadata Set Provision Agreement is registered for Data Provider registers existence of data and metadata
SDMX Technical Specs v 1.0 Information Model (data structure definitions and data formats) SDMX-ML: XML formats for data structure definitions and data SDMX-EDI: EDI formats for data structure definitions and data Web-Services Guidelines User Guide
Technical Notes on Version 1.0 Only numeric observations were supported Only coded key values were supported Intended to provide an XML version of the existing GESMES/TS data model GESMES/TS became SDMX-EDI XML extended the data model to provide for more types of groups and cross-sectional data Hierarchical codelists not supported
SDMX Technical Spec v. 2.0 Expanded data model includes Registry interfaces Metadata structures and formats Data and metadata provisioning Other advanced features (process flow, reporting taxonomy, structure mapping, etc.) Data formats now include uncoded dimensions, hierarchical codelists, and non-numeric observations
Technical Notes on Version 2.0 A very large expansion of scope Model covers the process of statistical exchange, not just the data formats Many cases which version 1.0 could not support were included in version 2.0 as a result of implementations Full support for the “data sharing” pattern of exchange Resulting from the inclusion of the registry
Changes for Version 2.1 • Expanded Web Services Guidelines • Standard WSDL Functions • Standard RESTful syntax (URL-based API) • Standard Error Codes • Will allow for interoperable web services for SDMX – so generic clients can use multiple sources • Simplified Data Formats • All data formats will be more consistent • Cross-sectional and time-series formats are more similar • SDMX Query has been improved • Note: SDMX 2.1 is available for public review now!
SDMX Content-Oriented Guidelines Four documents: Overview Metadata Common Vocabulary Cross-Domain Concepts Statistical Subject-Matter Domains These will not become ISO specifications, but will evolve as publications of the SDMX Initiative
Metadata Common Vocabulary A set of terms and definitions for the different parts of the SDMX technical standards, and many common concepts used in data and metadata structures Does not replace other major vocabularies in this space (such as the OECD glossary) but references these other works
Cross-Domain Concepts Includes concepts which are common across many statistical domains Names & Definitions Representations These are concepts which support both data and metadata structures
Statistical Subject-Matter Domains Based on the UN/ECE classification of statistical activities Provides a classification system for use in exchanging statistics across domain boundaries Provides a breakdown of the various domains within official statistics
We have a dataset, what do we need to know? Version 1.0 What it is and how it is structured Version 2.0 Who reports/disseminates it How a specific data set fits into the overall collection framework and which organisation is responsible for reporting which parts The reporting/publication schedule That it has been reported/published
First: Identify the Concepts A concept is a unit of knowledge created by a unique combination of characteristics (SDMX Information Model)
Data Set Structure:Concepts Stock/Flow Country Unit Multiplier Unit Time/Frequency Topic • Computers need structure of data • Concepts • Code lists • Data values • How these fit together
Data Set Structure: Code Lists TOPIC COUNTRY STOCK/FLOW A Brady Bonds B Bank Loans C Debt Securities AR Argentina MX Mexico ZA South Africa 1 Stock 2 Flow CONCEPTS Topic Country Flow Concepts Code Lists
Data Makes Sense 16457 Q,ZA,B,1,1999-06-30=16547
Data Set Structure: Defining Multi-dimensional Structures • Comprises • Concepts that identify the observation value • Concepts that add additional metadata about the observation value • Concept that is the observation value • Any of these may be • coded • text • date/time • number • etc. Dimensions Attributes Measure Representation
Stock/Flow Country Unit Multiplier Unit Time/Frequency Topic Observation Data Set Structure: Concept Usage (Dimension) (Dimension) (Attribute) (Attribute) (Dimension) (Dimension) (Dimension) (Measure)
CONCEPTS Topic Country Flow Data Structure Definition concepts that identify groups of keys concepts that identify the observation Key Group Key concepts that are observed phenomenon concepts that add metadata Attributes Measures Dimensions has format takes semantic from has format takes semantic from takes semantic from Representation Non-coded Concept Coded has code list has format TOPIC A Brady Bonds B Bank Loans C Debt Securities Code List
Data Makes Sense 16457 Frequency,Country,Topic,Stock/Flow,Time=Observation Q,ZA,B,1,1999-06-30=16547 Quarterly, South Africa, Bank Loans, Stocks, 2nd quarter 1999
Identifying Concepts Identifying Concepts - Sources Existing data set tables From website From applications Data Collection Instruments Questionnaires Excel spreadsheets Regulations, Handbooks, User Guides Labour Statistics Convention, 1985 (No. 160), Recommendation, 1985 (No. 170) Council Regulation No: 311/76/EEC of 09/021976; OJ: L039 of 14/02/1976; Compilation of statistics on foreign workers Database Tables Existing Data Structure Definitions From other organisations