630 likes | 707 Views
1. SDMX Background and purpose 2. An overview of SDMX 3. ESTAT Data & Metadata Exchange. David Matic Eurostat Unit B5: “Central data and metadata services” Author: Edward Cook. ESTP course European Business Statistics & FRIBS with STS as focus module , 3-4 October 2017.
E N D
1. SDMX Background and purpose2. An overview of SDMX3. ESTAT Data & Metadata Exchange David Matic Eurostat Unit B5: “Central data and metadata services” Author: Edward Cook ESTP course European Business Statistics & FRIBS with STS as focus module, 3-4 October 2017
2001: time to assess the way exchanges were made New developments offering opportunities to overcome issues
WHAT ARE THE ISSUES? • Data being collected in multiple ways(surveys, files, web queries, metadata etc.); • Data being transmitted in various formats (paper, excel, flat files etc.) • Data being transmitted in various media (email, CD-ROMs, file uploads etc.); • Data being stored in various places(USB, hard drive, servers, cloud etc.);
WHAT ARE THE ISSUES? • Multiple organisations collecting similar or same data; • Similar concepts in wording can have a different content; • Increasing burden on organisations (collection, maintenance and managing); • The intensive and manual nature of data collection; • Errors and inconsistencies;
WHAT WERE THE NEW DEVELOPMENTS? AN IDEA GERMINATES • Faster exchanges possible; • More frequent / bigger exchanges; - Increasing demand for data (ease of use of Internet); • Growing types of information exchange; - between businesses - between businesses and customers - between individuals
WHAT OPPORTUNITIES? Afforded by new technological developments, opportunities: • For the process to be more efficient; • To improve trust and reliability; • To improve web dissemination;
WHAT OPPORTUNITIES? • Simplification(streamlining data flows, central management); • Standardisation(software tools, data sharing); • Harmonisation(data structures, concepts and code lists);
' A pessimist sees the difficulty in every opportunity; an optimist sees the opportunity in every difficulty.'Winston Churchill
So what is SDMX? The 'Statistical Data and Metadata eXchange' is an international initiative aimed at developing and employing more efficient processes for the exchange and sharing of statistical data and metadata among international organisations and member countries. It consists of technical and statistical standards, guidelines, an IT service infrastructure and IT tools.
What is the business case for SDMX? • SDMX is a global response:7 international organisations as sponsors, in collaboration with countries throughout the world; • SDMX is an ISO IS standard (17369):- a document, established by consensus;- approved by a recognised body;- providing rules and guidelines;- for common and repeated use;- for optimum degree of order;- viewed as safe, reliable, good quality.
SDMX improves timeliness:- faster access to data; - move towards automation. • SDMX improves accessibility:- bilateral, gateway and data-sharing; - push and pull modes; • SDMX improves interpretability:- standardises structural metadata (the identifiers and descriptors of data); - standardises reference metadata (the content and quality of data);
SDMX improves coherence:- uses cross domain concepts;- uses shared code lists;- uses content oriented guidelines;- reuse across domains and agencies- aims for single figure dissemination. • SDMX can reduce data errors:- some automated validation; - agreed structures for transmission;- time saved on conversion, mapping;- less manual intervention.
SDMX can reduce the reporting burden: - pre-validated content;- automated publication;- possible 'pull' by collecting agencies. • SDMX can reduce IT development and maintenance costs:- open source approach; - no licensing costs;- shared toolbox;- improved interoperability between systems and applications.
SDMX is well suited to supporting a data sharing process:reporting every number only once.
SDMX is about changing from a multiple, diverse and complex exchange system, to a common, harmonised and standardised exchange system.
What are the downsides? • SDMX is not investment free: it means training; it means changes. • SDMX is not a magic wand:it is suited to aggregated data, more complex with microdata (ongoing improvements). • SDMX is dynamic:software versions are updated to increase functionalities and overcome bugs.
KEY messages: • SDMX responds to a business need; • SDMX improves quality in data and metadata exchanges; • SDMX is an international standard based on shared experiences; • SDMX offers cost-efficiencies.
A typical production chain Data collection is little different from other goods
What are the key features? MANUFACTURER PRODUCER CONTRACT SPECIFICATIONS • Type of fruits (oranges) • Dimensions of the box • Number of fruits per box
What are the key features? MANUFACTURER PRODUCER SPECIFICATIONS COMPANY OFFICE All the details of the contract are stored in the company offices to be checked by both parties CONTRACT
In SDMX … DATA PRODUCER DATA CONSUMER PROVISION AGREEMENT SDMX REGISTRY DATAFLOW DATA STRUCTURE DEFINITION
What is SDMX? • A model to describe statistical data and metadata • A standard for automated communication from machine to machine • A technology supporting standardised IT tools statisticians agree to use common descriptions and guidelines driven by these common descriptors for all to reuse developed as wide-ranging open source software
Presentation of SDMX • The SDMX Information model:What is the information model underlying the data and metadata exchange between the partners? • Content-oriented guidelines:How to increase the interoperability and statistical harmonisation? • IT Architecture for Data Exchange:How to exchange the data?
The Information Model:… is a representation of concepts, relationships, constraints, rules and operations. … is a formal way to: - express and design information needs - communicate with IT people - give specifications to reporting agents - document the system - drive the software
What things does SDMX need to model? • Statistical data • Through descriptor concepts. These concepts can be further classified into dimensions, attributes and measures. • Metadata • Structural metadata (such as concept names etc.) • Reference (or explanatory) metadata • Data exchange processes
Modelling structural metadata Data Structure Definition (DSD) • Identification of dimensions, attributes and measures • Use of common code lists • Integration into concept schemes
Modelling reference metadata Quality descriptions Process descriptions Methodological descriptions Administrative descriptions So much descriptive information. It needs to be expressed in a common, standard way.
The standard way is the Metadata Structure Definition (MSD) A Metadata Structure Definition describes how metadata sets, containing reference metadata are organised. In particular, it defines: • which metadata are being exchanged; • how these concepts relate to each other; • how they are represented (free text or coded values); • with which object types (agencies, data flows, data providers, subsets of data flows, or others) they are associated.
The Euro SDMX Metadata Structure (ESMS) • is the main report structure for reference (explanatory) metadata • uses concepts taken from the SDMX Glossary • is a metadata structure definition which is used across all statistical domains with more than 300 files disseminated on the website • is SDMX compliant
The ESS Standard Quality Report Structure (ESQRS) • is the main report structure for reference metadata related to data quality • uses concepts taken from the SDMX Glossary plus more detailed sub-concepts measuring data quality • is a metadata structure definition which is used across all statistical domains • is SDMX compliant
The relation between the ESS standards ESMS and ESQRS • ESMS is more oriented to the USERS of statistics • to understand the statistical data released • there is no need for too detailed information on data quality • 19 SDMX cross domain concepts used • ESQRS is more oriented to the PRODUCERS of statistics • to monitor the quality of the statistics produced in detail • concentrating on the main quality concepts (being also part of the ESS Statistics Regulation No 223/2009) However, there is information on data quality which is common to both ESMS and ESQRS.
Content-oriented guidelines • The content-oriented guidelines are a set of recommendations within the scope of the SDMX standard in order to produce maximum interoperability. • The SDMX standards: - provide essential support to statisticians; - maximise the amount of information through to users; - allow an automation of the process; - allow web-service queries.
There are three main areas in the content-oriented guidelines: • Statistical subject-matter domains. • Cross-domain concepts (and code lists). • A Metadata Common Vocabulary.
SDMX tools • Eurostat tools at our SDMX Info Space http://ec.europa.eu/eurostat/web/sdmx-info-space/sdmx-it-tools • SDMX Data Structure Wizard (used to create, edit and test SDMX artefacts). • SDMX Converter (converts data files between SDMX formats and other file formats). • ESS Metadata Handler • SDMX Reference Infrastructure (SDMX-RI) (set of tools that allows to connect your IT systems to the SDMX world). • SDMX Mapping Assistant (mapping and transcoding of the contents of an existing database to SDMX data structures). • SDMX official website https://www.sdmx.org
EDAMIS – Single Entry Point Electronic Data file Administration and Management Information System Tools • EWA – EDAMIS Web Application Installed at NSI – sending stat. Data to Eurostat, Notifications • EWP – EDAMIS Web Portal Installed at EC – sending stat. Data to Eurostat, Notifications, Dataset inventory, User Rights Management, monitoring traffic • EWF – EDAMIS Web Forms Part of EWP – sending stat. Data to Eurostat for low data volumes, spread-sheet like data grid, inline validation