1 / 46

SDMX training session on basic principles, data structure definitions and data file implementation

SDMX training session on basic principles, data structure definitions and data file implementation 29 November 200 7. A - Introduction. Provide understanding of the basic SDMX principles (DSD and Dataset Implementation) Provide knowledge to the SDMX Standard and its XML implementation

Download Presentation

SDMX training session on basic principles, data structure definitions and data file implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SDMX training session on basic principles, data structure definitions and data file implementation 29 November 2007

  2. A - Introduction

  3. Provide understanding of the basic SDMX principles (DSD and Dataset Implementation) Provide knowledge to the SDMX Standard and its XML implementation Present ESTAT tools as case studies illustrating their scope and usage Purpose of the training session

  4. Current practices on data and metadata exchange: Legal Framework (Commission Regulations, Council Regulations, etc.) Data and metadata files, questionnaires, quality reports, etc. Format (paper form, EDIFACT, XML, Structured Files, etc.) Media (Email, file upload, Web-form, removable media, dial-up, etc.) Current practices

  5. Enhance electronic data and metadata exchange Enhance availability of statistical data and metadata information for the users Promote interoperability between different systems Improve the quality of transmitted data (Timeliness & Punctuality, Accessibility & Clarity, Accuracy, Comparability) The need for a standard…

  6. Initiative on the standardisation of the statistical data and metadata exchange process. 7 Sponsors (BIS, ECB, ESTAT, IMF, OECD, UN, WB) “Push” and “pull” mode Use of XML technologies to promote interoperability Basic principles: Data Structure Definitions (DSD) & Metadata Structure Definitions (MSD) SDMX registries Data on the WEB using SDMX SDMX (Statistical Data and Metadata eXchange)

  7. Exchange and Sharing of statistical information Statistical data Statistical metadata Structural metadata Reference metadata Emphasis on macro-data (aggregated statistics) Promotes a “data sharing” model low-cost high-quality of transmitted data interoperability between (otherwise) incompatible systems SDMX (cont.)

  8. SDMX Training 29 November 2007 B – SDMX Core Elements

  9. EXAMPLE DATASET1

  10. EXAMPLE DATASET2

  11. The SDMX Information Model (SDMX-IM) is a conceptual model from which syntax specific implementations are developed. The SDMX-IM provides for the structuring not only of data, but also of “reference” metadata! The model is constructed as a set of structures which assist in the understanding, re-use and maintenance of the model. Data Structure Definition and Metadata Structure Definition Dataflows - Datasets Data Provisioning … SDMX Information Model

  12. Structures in the SDMX-IM

  13. Fundamental parts: Structural metadata (DSD, concepts, code lists) Observational data (organised set of numeric observations) Reference metadata Definitions: Data Structure Definition (DSD):set of structural metadata needed to understand the dataset structure Dataflow Definition:a description of the dataset which identifies, categorises and constraints the allowable content of the dataset Dataset: an organised collection of statistical data the ‘container’ of a Data Flow Definition for an instance of the data. Structures in the SDMX-IM (cont.)

  14. Code lists – Codes:list of predefined values to be used within the DSD Codelists enumerate a set of values to be used in the representation of several structural components of SDMX. Concept Schemes – Concepts: a statistical characteristic used within a DSD Additional properties can be defined for concepts: Provide Name/Description in various locales Assign default representation (coded or uncoded) Define semantic hierarchies of concepts Category Schemes – Categories: Category schemes are made up of a hierarchy of categories (subject matter domains), which in SDMX may include any type of useful classification for the organization of data and metadata A Dataflow may be linked to many Categories Structures in the SDMX-IM (cont.)

  15. Dimension (e.g. frequency, reference area): Classificatory variable used for identification of subsets or single observations Definition of the key descriptor for reporting Datasets Attribute (e.g. title, observation status): Add additional metadata about the observations Can be attached at four possible levels (Observation, Time Series / Cross-Sectional data, Group, Data Set) Measure (e.g. turnover index, # of births, # of deaths): Data (uncoded / unclassified) that can be reported (The observation value) Primary (Time Series) or Cross-Sectional (Cross-sectional data) Groups: Grouping of dimensions in order to attach group attributes (e.g. sibling group) DSD components

  16. Examples: Time Series dataset STS domain: Turnover Index for Retail Trade and repair DSD Cross-Sectional dataset Demography domain: Rapid questionnaire DSD Data Structure Definition

  17. STS Sample Dataset Dimensions Attributes Dimensions Measure

  18. STS DSD components Dataflow: STSRTD_TURN_M

  19. Measures Demography Sample Dataset Attributes Dimensions

  20. Demography DSD components Dataflow: DEMOGRAPHY_RQ

  21. Data Provisioning • A Data Provider can provide data/metadata for many Dataflows using an agreed data structure. • Dataflows may incorporate data coming from more than one Data Provider. • Provision Agreement  which data providers are supplying what data to which data flows. • The Dataflow may be linked to 1 or more Categories (subject matter domains) from different Category Schemes.

  22. Identification: every structural element must have a semantic identifier (e.g. CL_UNIT) Versioning: a specific element may have different versions (updates of the element) Maintenance: some structures must be maintained by an organisation Unique identification: id+version+agency id: CL_UNIT version:1.0 agency: ESTAT id: CL_UNIT version:1.0 agency: ECB Internationalization: the use of multiple languages for describing any element SDMX-IM covers aggregate data and metadata in all domains (not domain-specific) Identification, Versioning & Maintenance

  23. CategoryScheme Data or Metadata Structure Definition comprises subject or reporting categories uses specific data/metadata structure can be linked to categories in multiple category schemes Data or MetadataSet Data or Metadata Flow Category conforms to business rules of the data/metadata flow can have child categories can get data from multiple data providers can provide data or metadata for many data or metadata flows using agreed data or metadata structure Registered Data or MetadataSet Provision Agreement is registered for Data Provider SDMX High level View

  24. Tools Demonstration

  25. A repository for keeping Structural metadata (e.g. CodeLists, ConceptSchemes, DSDs) Provisioning information (e.g. Dataflows, Provision agreements) Repository is accessible via a Web Service accepting SDMX-ML messages Graphical User Interface (GUI) for user interaction over the Web SDMX Registry

  26. DSW – “standalone” application (replacing AccessDB tool) Main functionalities Manage data structures (create, modify, delete, query) Import/Export SDMX-ML structures (validate structure messages) Import/Export GESMES/TS structure files Create Data messages Query SDMX Registry Submit data structures to SDMX Registry Data Structure Wizard

  27. Example - DSD creation using the DSW

  28. Dimensions Frequency (CL_FREQ) Reference Area (CL_AREA_EE) Time period Product (CL_PRODUCT) Attributes Compilation (uncoded, @group) Confidentiality (CL_OBS_CONF, @observation) Status (CL_OBS_STATUS, @observation) Availability (CL_AVAILABILITY, @series) Group Example

  29. SDMX Training 29 November 2007 C – SDMX-ML Data sets

  30. Based on a common Information Model SDMX-EDI (GESMES/TS) EDIFACT syntax Time series oriented – One format for Data Sets SDMX-ML XML syntax Four different formats for Data Sets Easier validation (XML based) Tools enable us to use the desired format Syntaxes for SDMX data

  31. Equivalent representations for reporting Datasets: Generic message: one schema, not domain-specific Compact message: format for large-volume exchange of data, schema is specific to a DSD Utility message: format for advanced validation, schema is specific to a DSD Cross-Sectional message: format for non-time-series data, schema is specific to a DSD SDMX-ML Data Messages

  32. Used for representing time-series data Contain related metadata as defined in DSDs Three different (equivalent) representations available Generic message Compact message Utility message The SDMX-ML Time-Series format

  33. Generic Dataset

  34. Compact Dataset

  35. Utility Dataset

  36. Used for representing non time-series data Contain related metadata as defined in DSDs Two different representations available Generic message Cross-Sectional message The SDMX-ML Cross-Sectional data format

  37. Cross-Sectional Dataset

  38. Equivalent formats Can convert from any SDMX-ML format to another Based on the same IM Exceptions: If a Cross-Sectional DSD does NOT contain time dimension Conversions: Between the SDMX-ML formats Can be expanded to other formats (e.g. CSV, GESMES) Conversions

  39. SDMX Training 29 November 2007 D – Producing SDMX-ML Data sets

  40. Define and classify all the underlying concepts of a dataset Provide the specification of the DSD: Name & identifier List of statistical concepts List of metadata concepts List of code lists Provide the related Dataflows (e.g. STSRTD_TURN_M, DEMOGRAPHY_RQ) List the Mandatory attributes (e.g. reference area, frequency), and the Conditional ones Reporting and Dissemination Guidelines

  41. Comprises: DSD details (id, version, agencyID) Dimensions (concepts, representations, dimension types -e.g. frequency, entity, count, etc.-, attachment level ) Measure (primary or cross-sectional) Attributes (concept, representation, assignment status –mandatory or conditional-, attachment level, attribute type, attachment measure) Groups (subset of dimensions) Message Implementation Guidelines (MIG)

  42. DSD table Dataflows table Referenced concept schemes Referenced Code Lists Detailed explanation of the Generic SDMX-ML sample dataset Detailed explanation of the Compact (or Cross-Sectional) SDMX-ML sample dataset Structure of a MIG document

  43. Example - Data Set creation using the DSW

  44. Main Functionality Reading the input message parsing of the message populating the data model of the tool (based on the SDMX v2.0 information model ) Writing the converted message uses the data model to write the output message in the required target format. Information retrieved from the Registry Data flow ID is used to retrieve the data flow definition from the Registry. The DSD is retrieved from the data flow definition and is used to acquire the DSD SDMX Converter

  45. Tool utility: You may already have data in other format than SDMX-ML (e.g. CSV, GESMES/TS) CSV  Compact SDMX-ML You may want further validation of your data Compact SDMX_ML  Utility SDMX_ML Conversions: From CSV to any type From SDMX-ML to any type From SDMX-EDI to any type SDMX Converter (cont.)

  46. Conversion Example

More Related