880 likes | 1.2k Views
SDMX Basics Core Elements Information Model Data Structure Definition (DSD) SDMX-ML Messages Major changes in SDMX v 2.1. THE SDMX COMPONENTS. Technical Specifications The SDMX Information Model. Guidelines to Hamonise Content The Content Oriented Guidelines (COG). Tools
E N D
SDMX Basics Core Elements Information Model Data Structure Definition (DSD) SDMX-ML Messages Major changes in SDMX v 2.1
THE SDMX COMPONENTS TechnicalSpecifications The SDMX Information Model Guidelines to Hamonise Content The Content Oriented Guidelines (COG) Tools IT Architectures for data exchange SDMX complianttools SDMX is not just a data transmission format…
THE SDMX INFORMATION MODEL The SDMX Information Model is a meta-model describing the objects involved in: • The collection • The dissemination • The publication of aggregated statistics and related metadata The abstract model is like a structured set of containers • Everything in SDMX is model-driven: • All messages and interfaces are implementations of the information model
SDMX INFORMATION MODEL – SCOPE Structure Definition Category Scheme DATA & METADATA FLOWS Data & Metadata set Category Data Provider Provision Agreement Constraint
STATISTICAL DATA & METADATA StatisticalData (Figures) Time series data representation Cross-sectional data representation StatisticalMetadata (Identifiers, Descriptors) Structural metadata StatisticalMetadata (Methodology, Quality) Referencemetadata
STATISTICAL DATA & METADATA Two different ways to represent data Statistical data - Cube Time series Tourism activity 8174 8138 8052 B020 Cross-section for 2006 B010 Time A100 2007 2006 542 2005 1216 8138 2510 Country AT FR ES IT
STRUCTURAL METADATA Introduction From a number to statistical data 11353511 11353511
STRUCTURAL METADATA Identify and describe data CONCEPTS Dimension, Attribute or Measure in a DSD to define a Data set’s structure Attributes in a MSD to define the structure of a Metadata set
STRUCTURAL METADATA From a statistical table to its descriptor concepts
STRUCTURAL METADATA: DATA STRUCTURE DEFINTION To easily exchange and process data, we first define a standard container based on the structure of the real statistical table: The Data Structure Definition (DSD) UNIT TIME_PERIOD The DSD can be seen as a "logical container" for a specific set of data that we want to exchange. It includes the concepts that represent the data, gives them roles (Dimension, Measure, Attributes) and links them to code lists. COUNTRY OBSERVATIONS DSD Concepts Measures Code lists Attributes Code lists Dimensions Code lists
SDMX INFORMATION MODEL - DATA SET SDMX does not introduce any new concept for statisticians. It just provides a framework for what statisticians already know. The SMDX dataset is a standard container in which statistical data are represented together with the structural metadata, according to the DSD. Table structure Dataset Code lists DSD Observations Now you have an easy way to exchange and process data and metadata automatically.
SDMX INFORMATION MODEL - DATA SET DATA SET GROUP KEY KEY GROUP KEY KEY GROUP KEY KEY KEY VALUES KEY VALUES KEY VALUES Time series Cross-section TIME PERIOD OBSERVATION VALUE ATTRIBUTE VALUE Attributeattachment Attributeattachment
Reference Metadata Set SDMX INFORMATION MODEL - METADATA SET Concepts
SDMX INFORMATION MODEL – DATA & METADATA FLOW Structure Definition Category Scheme DATA & METADATA FLOWS Data & Metadata set Category Data Provider Provision Agreement Constraint
SDMX IM – DATA PROVIDERS & PROVISION AGREEMENT Production and dissemination of Statistical data Production and dissemination of Reference Metadata
SDMX IM - CONSTRAINTS DATA & METADATA FLOWS Provision Agreement Constraint
SDMX IM - CONSTRAINTS Example: A data provider can restrict his reporting of monthly data to only some months. Example: A data provider can restrict his reporting of data to subsets of statistical cubes.
THE SDMX COMPONENTS TechnicalSpecifications The SDMX Information Model Guidelines to Hamonise Content The Content Oriented Guidelines (COG) Tools IT Architectures for data exchange SDMX complianttools SDMX is not just a data transmission format…
SDMX REGISTRY REGISTRY
COMPLIANCE & IMPLEMENTATION Generally the following four steps need to be done: Preparation: The statisticians from the organisations involved in the data exchange describe the data and the different dataflows, dataset and provision agreements. Compliance: you create all the necessary objects according to the SDMX Technical Specifications. Implementation: Now we put into practice. Standard software is installed and configured to use the DSDs. The exchange process is set up and tested. Production: use the objects in the production process. SDMX implementation is achieved when the data and metadata exchanges within the domain are carried out according to SDMX-compliant specifications.
Define the DSD List of concepts (Concept scheme) Roles of concepts (Dimension, Attribute, Measure) Code lists Provide the related Dataflows (e.g. STSRTD_TURN_M, DEMOGRAPHY_RQ) CREATE all the necessary objects
The steps to build a Data Structure definition 1 Identification of the descriptor concepts for the data Choose the type of data representation (Time Series and Cross-sectional) 2 Choice of Cross Domain code lists or definition of specific code lists for coded concepts 3 Definition of the text format for non coded concepts Definition of the concept role (Dimension, Attribute or Measure) 4 5 Define Attributes with the attachment levels Time Series and Cross-sectional data representation Define Time Series primary measure and/or Cross-sectional measures with their measure concepts Define Dimensions for Time Series and Cross-sectional data representation Create the defined artefacts in a SDMX Data Structure Definition tool (e.g. DSW)
Time series slice Tourism activity Cross-sectional slice B020 B010 Time A100 2007 2006 2005 Country AT ES FR IT 3- Choose the type of data representation Time Series (TS) / Cross-sectional (CS) Statistical data - Cube Time series 1250 1216 1220 Cross-section for 2006 542 1216 8138 2510
6 – DEFINE The view of the Data Structure Eurostat Unit B5 – Statistical Information TechnologiesSDMX Training for Statisticians – March 2010
EXAMPLE: STS Sample Dataset Primary Measure Dimensions Attributes Dimensions
EXAMPLE: STS Sample Dataset STS_INDICATOR TITLE STS_ACTIVITY REFERENCE_AREA ADJT FREQ STS_ BASE_YEAR
EXAMPLE: STS Sample Dataset OBS_VALUE OBS_STATUS OBS_CONF REFERENCE_PERIOD STS_INSTITUTION
EXAMPLE: STS Sample Dataset IDENTIYING CONCEPTS AND GROUPING SERIES IN CSV FILES M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A;F M;GR;N;TOTV;NS5201;1;2000;200202;84.7;A;F M;GR;N;TOTV;NS5201;1;2000;200203;88.8;A;F M;GR;N;TOTV;NS5201;1;2000;200204;93.0;A;F M;GR;N;TOTV;NS5201;1;2000;200205;60.8;A;F M;GR;N;TOTV;NS5201;1;2000;200206;78.2;A;F M;GR;N;TOTV;NS5201;1;2000;200207;89.9;A;F M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A,F Dimensions Group Reference Period Primary Measure Attributes
DSD of dataflow STSRTD_IND_M Footnotes Roles List of variables Codes Values
STRUCTURE OF THE DATASET FOR TIME SERIES Attributes and attachment level: group Group of series REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS5201" STS_INSTITUTION="1" STS_BASE_YEAR="2000" DECIMAL="1" TITLE="Retail trade" Series M;GR;N;TOTV;NS5201;1;2000;200201;88.8;A;F M;GR;N;TOTV;NS5201;1;2000;200202;84.7;A;F M;GR;N;TOTV;NS5201;1;2000;200203;88.8;A;F M;GR;N;TOTV;NS5201;1;2000;200204;93.0;A;F Series Attributes can be attached to groups Series Series Group of series REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="N15220" STS_INSTITUTION="1" STS_BASE_YEAR="2000" DECIMAL="1" TITLE="Retail sale of food" Series M;GR;N;TOTV;N15220;1;2000;200201;60.8;A;F M;GR;N;TOTV;N15220;1;2000;200202;78.2;A;F M;GR;N;TOTV;N15220;1;2000;200203;89.9;A;F Series Series
STRUCTURE OF THE DATASET FOR TIME SERIES Attributes and attachment level: series FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="NS0006" STS_INSTITUTION="1" STS_BASE_YEAR="2000"TIME_FORMAT="P1M" Definition of Series 1 Attributes can be attached to series Attributes can be attached to series M;GR;N;TOTV;NS0006;1;2000;200201;88.8;A;F M;GR;N;TOTV;NS0006;1;2000;200202;84.7;A;F M;GR;N;TOTV;NS0006;1;2000;200203;88.8;A;F Series 1 Series 1 Series 1 FREQ="M" REF_AREA="GR" ADJUSTMENT="N" STS_INDICATOR="TOTV" STS_ACTIVITY="N14500" STS_INSTITUTION="1" STS_BASE_YEAR="2000"TIME_FORMAT="P1M" Definition of Series 2 M;GR;N;TOTV;N14500;1;2000;200201;60.8;A;F M;GR;N;TOTV;NS0006;1;2000;200202;78.2;A;F M;GR;N;TOTV;NS0006;1;2000;200203;89.9;A;F Series 2 Series 2 Series 2