380 likes | 538 Views
SDMX basics Marco Pellegrino Eurostat, Directorate B. Purpose of this training session. At the end of this session you will: Know the basics of the SDMX model Understand the techniques to identify the structure of data Identify the concepts in a simple data set
E N D
SDMX basics Marco Pellegrino Eurostat, Directorate B
Purpose of this training session At the end of this session you will: Know the basics of the SDMX model Understand the techniques to identify the structure of data Identify the concepts in a simple data set Be able to develop simple data structure definitions using SDMX tools Be familiar with the main IT architecture and tools used by Eurostat for SDMX implementation projects
UNSD World Bank Statistical Data and Metadata eXchange SDMX ISO IS 17369
What is a standard? According to ISO: A standard is • a document, • established by consensus and • approved by a recognized body, • that provides rules, guidelines or characteristics • for common and repeated use, • for activities or their results, • aimed at the achievement of the optimum degree of order in a given context.
Lack of standardisation in data exchanges or across organisations Different formats of data and metadata Different places to store data and metadata EDIFACT XML Different media paper form Email Web-form dial-up Structured Files removable media file upload Paper
WHAT SDMX IS • A model to describe statistical data and metadata • A standard for automated communication from machine to machine • A technology supporting standardised IT tools In order to take advantage of all this: • Statisticians agree to use a common description for data and metadata • The data exchange process is then driven by the common description • Data descriptions are made available for everybody who wants to understand and reuse the data This is what SDMX provides and enables
Version 1.0 GESMES/TS Version 2.0 From version 1.0 to version 2.1 to…? ISO/TS 17369 SDMX-EDI SDMX-ML SDMX Registry 2008 SDMX accepted at UN level SDMX recognised and supported as the preferred standard SDMX 2.1 Version 1.0 Version 2.0 September 2004 November 2005 February 2008 April2011
All good standards change… All standards change over time, and are released in a series of versions Changes always have some impact on users Users are not expected to always use the latest version of a standard Standards organisations (like SDMX) have to provide support for several versions of the standard, all of which are in use
Change management Danger (1): too much change may discourage adoption Danger (2): not giving users the functionalities they want will discourage adoption Need to find a balance
THE SDMX COMPONENTS TechnicalSpecifications The SDMX Information Model Guidelines to Harmonise Content Content-oriented Guidelines (COG) Tools IT Architectures for data exchange SDMX complianttools SDMX is not just a data transmission format…
A model is a partial analogy of a system René Magritte “This is not a pipe” The analogy between the model and the represented reality is partial. The properties of the model are not identical to the properties of the reality. I can’t smoke with this pipe!
The four meta-modelling levels SDMX metamodel Data model: concepts, codes, DSD Real data(e.g. BOP, ESA) A model represents a system and conforms to a metamodel
Information Model Design Disseminate Use Build Collect Process Administrator DEFINITIONS DATA User Software Services Overall integration of methods and techniques
Flexible information system, evolving fast & cheaply User autonomy The role of the Information model A user level formal language to: • express, agree and design information needs • give specifications to reporting agents • communicate with IT people • drive the software (which doesn’t change) • document the system
SDMX Information Model (“metamodel”) Provides a way of modelling data, metadata and exchange processes Dimensions (ex: country, variable/topic, year) Data Structure Definition (DSD) Dataset Structure Code lists Structural Metadata Attributes (ex: unit of measure) Identify/Describe Metadata about an individual value, a time series or a group of time series Data
Describing the data exchange Who? Who? When? How? Where? What? What?
Stock/Flow Country Unit Multiplier Unit Time/Frequency Topic Observation Data Structure Definition: Concept Usage (Dimension) (Dimension) (Attribute) (Attribute) (Dimension) (Dimension) (Dimension) (Measure)
Data Structure Definition:Defining Multi-dimensional Structures • Comprises • Concepts that identify the observation value • Concepts that add additional metadata about the observation value • Concept that is the observation value • Any of these may be • coded • text • date/time • number • etc. Dimensions Attributes Measure Representation
Domain 1 Domain 2 Cross-domain concepts and code lists Set of used concepts Cross-domain concepts FREQ REF. AREA COMPARABILITY
Statistical subject-matter domains Based on the UNECE Classification of International Statistical Activities
Organisation 1 Organisation 2 Organisation 3 Content-Oriented guidelines Cross-domain concepts and code lists Statistical subject-matter domains Metadata common vocabulary Recommendations to harmonise implementations interoperability
SDMX provides support for things that are essential to Statisticians, but are often difficult for them to achieve International standard for holding all of the elements involved in the statistical process together in a clear information model Approach that maximises the amount of information on the statistical context that can be passed through to users, and the capacity of linking statistics from similar or different sources Automation of processes: SDMX enables the development of common tools that can be used by all statistical organisations to improve their activities Some benefits from SDMX standards
Statistical Organisation Statistical Organisation Benefits from SDMX standards (2) SDMX is also an advanced standard for data discovery using web-based services SDMX Reference Infrastructure Web services enable query, visualisation, and automated loading of data and metadata. SDMX tools allow querying a database, or a file system, for the creation of tables, charts, and graphs from the results of the query. SDMX Reference Infrastructure
SDMX describes the data and metadata exchange by end of June Provision Agreement Organisation scheme SDMX Registry maintainer Concept Schemes Codelists DSD Concepts
Data Repository (Warehousing) Architecture register SDMX Registry query NSI P U L L Received data in SDMX-ML Eurostat Pull Requestor Loader Eurobase Dissemination Verification / Conversion To SDMX eDAMIS P U S H XSL for SDMX-ML Warehouse storage Intermediate storage Data Input
SDMX progress, 2011 to 2015 • Standards Development: April 2011, SDMX 2.1 Technical Standards released @ sdmx.org • May 2011: SDMX Global Conference in Washington, D.C.Next: 11-13 September 2013 (OECD, Paris) Self-learning tutorials comprising video, textbook and self-test. • Governance: Creation of two SDMX Working Groups (Technical Working Group and Statistical Working Group) • Action Plan 2011 to 2015
http://epp.eurostat.ec.europa.eu/portal/page/portal/pgp_ess/news/ess_news_detail?id=112774074&pg_id=2417&cc=ESTAT_EUROSTAThttp://epp.eurostat.ec.europa.eu/portal/page/portal/pgp_ess/news/ess_news_detail?id=112774074&pg_id=2417&cc=ESTAT_EUROSTAT
Training courses on SDMX SDMX basics (for statisticians and IT staff) Held at Eurostat. Aimed at people in charge of managing SDMX-based transmission and dissemination of data and metadata. SDMX advanced course (for IT developers) Held at Eurostat. Targeted at IT developers and proposed in two versions:JAVA programmers .NET programmers ESTP course on “Advanced technologies for data collection and transmission” External
For more information http://www.sdmx.org (SDMX web site) https://webgate.ec.europa.eu/fpfis/mwikis/sdmx (Eurostat Info Space) Estat-SDMX@ec.europa.eu (General info on SDMX) Estat-support-sdmx@ec.europa.eu (Eurostat implementation projects) Marco.Pellegrino@ec.europa.eu