520 likes | 677 Views
SDMX Information Model. Pedagogical Explanation Arofan Gregory and Chris Nelson OECD SDMX Expert Group Meeting Geneva April 6-7 2006. Data Set. We have a dataset, what do we need to know?. Its structure Who reports/disseminates it
E N D
SDMX Information Model Pedagogical Explanation Arofan Gregory and Chris Nelson OECD SDMX Expert Group MeetingGeneva April 6-7 2006
We have a dataset, what do we need to know? • Its structure • Who reports/disseminates it • How a specific data set fits into the overall collection framework and which organisation is responsible for reporting which parts • The reporting/publication schedule • That it has been reported/published
Stock/Flow Country Unit Multiplier Unit Time/Frequency Topic Data Set: Structure • Computers need structure of data • Concepts • Code lists • Data values • How these fit together
Topic Country Stock/Flow A Brady Bonds B Bank Loans C Debt Securities AR Argentina MX Mexico ZA South Africa 1 Stock 2 Flow Concepts TOPIC COUNTRY FLOW Structural Definitions Code Lists Concepts
16457 Data Makes Sense ZA,B,1,1999-06-30=16547
Data Set: Structure • Comprises • Concepts that identify the observation value • Concepts that add additional metadata about the observation value • Concept that is the observation value • Any of these may be • coded • text • date/time • number • etc. Dimensions Attributes Measure Representation
Stock/Flow Country Unit Multiplier Unit Time/Frequency Topic Observation Data Set: Structure [dimension] [dimension] [attribute] [attribute] [dimension] [dimension] [dimension] [measure]
Topic A Brady Bonds B Bank Loans C Debt Securities Concepts TOPIC COUNTRY FLOW Data Structure Definition Key Group Key Dimensions Attributes Measures Representation Concept
Data Set: Publishing/Reporting • Publishing data sets and collecting data sets is a process • As a process it must have metadata that enables organisations to control it • what data is it • who publishes it • who collects it • when is it published/reported
uses specific data/metadata structure conforms to business rules of the dataflow publishes/reports data sets Structure Definition Data Flow Data Set can get data from multiple data providers can provide data for many data flows using agreed data structure Provision Agreement Data Provider • The data flow is the artefact that contains metadata about the provision of data • In a data reporting scenario the data flow is defined by the data collector, and there can be many data providers reporting data for the data flow • A data provider may report data for many data flows (perhaps for many organisations)
Organising Data Flows • Organisations may wish to categorise the data flows • For convenience • To facilitate control • who reports what/when (release calendar) • who has reported • more about these later • To facilitate search for data (more about this later)
Release Calendar Data Reporting Data Structure Definition CategoryScheme comprises subject or reporting categories uses specific data/metadata structure can be linked to categories in multiple category schemes Data Flow Category Data Set conforms to business rules of the data/metadata flow can have child categories publishes/reports data sets can get data from multiple data providers can provide data for many data flows using agreed data structure Provision Agreement Data Provider Metadata
We have metadata what do we need to know? • What is the metadata for (what does it describe) • Who reports it • How a specific metadata set fits into the overall collection framework and which organisation is responsible for reporting which parts • The reporting schedule • That it has been reported
Metadata: Controlling It • What can be done for data can also be done for metadata • Metadata has a structure • Metadata is reported/published • Metadata needs to be controlled • Metadata needs to be found • Metadata may need to be linked to data
What Sort of Metadata? • Data values are limited in where they belong • Series key (usually qualified by time) • Data attribute values are limited in where they belong • Observation value • Series key • Group key • Data set • Metadata is not limited in this way • Metadata is everywhere • Can we learn from the data side how to describe metadata structure definitions
Identify Structure Release Calendar Metadata Structure Definition • Concepts • Hierarchies • Representation (e.g. code list) Provision Agreement
core definition of format and permitted values Format and Permitted Value List overrides core definition Metadata Attributes Item Scheme defines “keys” of object types to which metadata can be “attached” Full Target Identifier Identifier Components Metadata Structure Definition uses defined concepts concept defined in Metadata Report Concept Scheme Concept takes semantic and context from can have hierarchy specifies to which object types the concept can be “attached” Partial Target Identifier identifies the code list from which the value of the (key) component must be taken when metadata is reported specifies the identifier components (“key”) of the target object identifies target object type of the component Target Object Type
Release Calendar Metadata Target Data Flow Provision Agreement Data Provider
Metadata Attributes Item Scheme CL_Status Date/Time F Final P Provisional Full Target Identifier CL_DATA_FLOW CL_DATA_PROVIDER 1A ABS 2A SNZ BOP Balance.. NAC National.. Identifier Components Metadata_Concepts ARC Metadata Structure Definition MetadataReport Concept Scheme Concept Release Date Release Status Format and Permitted Value List Id = Provision_Agreement Can be used to identify just the Data Provider or just the Data Flow Partial Target Identifier Data Flow Data Provider Target Object Type
CL_DATA_FLOW CL_DATA_PROVIDER 1A ABS 2A SNZ BOP Balance.. NAC National.. Metadata Structure Definition: Identifiers Metadata Structure Definition = ARC_DATA Full Target Identifier = Provision_Agreement Identifier Component Target Object Type = Data Flow Item Scheme = Identifier Component Target Object Type = Data Provider Item Scheme =
CL_Status Date/Time F Final P Provisional Metadata Structure Definition: Metadata Report ARC Metadata Report = Attachment = Provision_Agreement Metadata Attribute Concept = Release Date Representation = Metadata Attribute Release Status Concept = Representation =
Metadata Reporting Metadata Structure Definition CategoryScheme comprises subject or reporting categories uses specific metadata structure can be linked to categories in multiple category schemes Metadata Flow Category Metadata Set conforms to business rules of the metadata flow can have child categories can get metadata from multiple metadata providers publishes/reports metadata sets Constraint can have constraints – sub set of possibilities defined in the Structure Definition Provision Agreement can provide metadata for many metadata flows using agreed metadata structure Data Provider
Information Model: Summary So Far • Supports data and metadata reporting and exchange • Data and metadata structure definitions • Data and metadata sets • Supports the process of reporting and exchange • Data/metadata providers • Data/metadata flows • Provision agreements
Data/Metadata Reporting/Exchange CategoryScheme Structure Definition comprises subject or reporting categories uses specific data/metadata structure can be linked to categories in multiple category schemes Data Set or Metadata Set Data or Metadata Flow Category conforms to business rules of the data/metadata flow publishes/reports data sets or metadata sets can have child categories can get data/metadata from multiple data/metadata providers Constraint can have constraints – sub set of possibilities defined in the Structure Definition can provide data/metadata for many data/metadata flows using agreed data/metadata structure Provision Agreement Data Provider
Controlling Data and Metadata • How do we control data and metadata reporting? • How do we find data and metadata? • How do we share data and metadata
The Registry supports many of the artefacts in the Information Model Hold indexes for data and metadata and where these can be found on the web Data and metadata set indexes Stores structure definitions Data and metadata structures Code lists Category schemes Data flows Stores provisioning metadata Data providers Provision agreements The Registry is used to store structural and provisioning definitions, to register data sets and metadata sets, and links between them The Registry is a resource that can be queried by applications to find data, metadata, and the structural definitions supporting these The Registry specification defines the behaviour of an SDMX Registry and the Registry interfaces, which are an XML schema specification The Registry “functions” are modelled in the Information Model, but its functionality is best explained in the context of the schematic already used for data and metadata (Data/Metadata Reporting and Exchange) SDMX Registry
SDMX Registry/Repository SDMX Registry Interfaces Register Indexes data and metadata REGISTRY Data Set/Metadata Set Query Describes data and metadata sources and reporting processes Submit Subscription/Notification REPOSITORY Provisioning Metadata Query Submit REPOSITORY Structural Metadata Describes data and metadata structures Query
Data Set Registration Structure Definition • The data is “registered” against the provision agreement • The Constraint holds the indexes – such as the series keys, or the list of dimension values Data Flow Constraint Keys Data Set Provision Agreement Data Provider URL, registration date etc.
Data Query Labor force statistics CategoryScheme Structure Definition Labor force earnings Labor force employment • The query can start anywhere and navigate to the data • In the registry all navigation is bi-directional. • Category Drill down searches will start at the Category and go via Data Flows. • Fine grained queries can be built using structural metadata (e.g. dimension names and possible values) • Fine grained searches are possible on the Constraints Data Flow Category Constraint Data Set Provision Agreement Data Provider
Metadata Set Registration • Metadata that is reported regularly is registered against the (Metadata) Provision Agreement • The metadata content (the metadata set) is linked to the object to which it relates • This link can be stored in the registry • e.g. a link to data set to which it relates • a link to the data provider to which it relates • Registry/Repository operators could use the repository to store the metadata itself • This is not a part of the Information Model nor of the SDMX standards
Metadata Query • The indexed metadata set itself can be searched • Links to data can be discovered and followed • e.g. is there any metadata for a specific data set, or part of the data set? • If so what sort of metadata? • Where is the metadata (URL)? • More on this later
Information Model: Summary So Far • Supports data and metadata reporting and exchange • Data and metadata structure definitions • Data and metadata sets • Supports the process of reporting and exchange • Data/metadata providers • Data/metadata flows • Provision agreements • Supports registration • Data and metadata sets • Supports query • Categories linked to data and metadata • Constraints for finer grained queries
Summary: Data/Metadata Reporting, Query CategoryScheme Structure Definition comprises subject or reporting categories uses specific data/metadata structure can be linked to categories in multiple category schemes Data Set or Metadata Set Data or Metadata Flow Category conforms to business rules of the data/metadata flow publishes/reports data sets or metadata sets can have child categories can get data/metadata from multiple data/metadata providers Constraint can have constraints – sub set of possibilities defined in the Structure Definition can provide data/metadata for many data/metadata flows using agreed data/metadata structure Provision Agreement Data Provider
Registry – what else? • Link metadata to parts of a data set or data base contents • Query for metadata linked to data
Registry – link metadata to data These can be described in terms of key sets, combined into an Attachment Constraint, linked to a specific data set, and a metadata set
Constraints – Structure • Supports the specification of sub sets of data or metadata structure definitions or data and metadata sets • In terms of allowable key values • In terms of allowable dimension, attribute, or measure values • Constraints can apply to: • Data sets – so called “cubes” or “cube regions” • Entire databases • Data flows • Metadata sets • Entire metadata repositories • Metadata flows • Data providers • Provision agreements • Two kinds of Constraint • Content – this is used to define the actual or allowable content • Attachment – this is used to define a sub set of data or metadata set for the purpose of attaching metadata to it
Constraints – Structure Schematic Sets of keys to be included in or excluded from the scope Constraint AttachmentConstraint ContentConstraint Key Set Sets of values to be included in or excluded from the scope Specification of a key Cube Region Key Set of values for a concept Identity of the Concept (e.g. Country) Specification of a key value Concept Values Key Value Concept List of values Values
Constraints – usage • Data source registration • Data source can be a data set or a database • Content Constraint is used to define the content of a data set or database • This supports fine grained queries • Attaching metadata to parts of a data set or other data source • Target object of a metadata set is an Attachment Constraint linked to a registered data set or database content
Stock/Flow Country Topic A Brady Bonds B Bank Loans C Debt Securities AR Argentina MX Mexico ZA South Africa 1 Stock 2 Flow Attachment Constraint Metadata is linked to the Constraint Constraint is linked to the Data Set Registered Metadata Set Registered Data Set Attachment Constraint Key Sets define the sub set of the Data Set Key Set ZA,B,1,1999-03-31 ZA,B,1,1999-06-33 ZA,B,1,1999-09-30 ZA,B,2,1999-03-31 etc. Key(s)
Information Model: Support for Data Analysis • Viewing, comparing and analysing data in different groupings • Hierarchical Code Lists • Converting data and metadata from one coding and structure scheme to another scheme • Structure and Code Mapping
Hierarchical Code Lists - Example • France is a country • France is part of the continent of Europe • France is a member of NATO • France is a member of the EU • France is a member of the G10 • When I analyse statistics I might want to see totals by • continent • trading block • military alliance • financial grouping • France will be grouped with different sets of countries depending on the “view” required • How do we express these groupings?
Code List Code Composition Reference Area 6B NATO B0 EU B1 NAFTA BE Belgium BG Bulgaria CA Canada CH Switzerland CZ Czech Republic DE Germany DK Denmark E1 Europe E8 North America EE Estonia ES Spain FI Finland FR France GB United Kingdom GR Greece HU Hungary JP Japan I2 Euro 12 IT Italy NE Netherlands US United States Code G10 countries Europe EU countries NATO countries NAFTA countries Code Association North America
comprises hierarchies Hierarchical Code Scheme comprises code groups Code List belongs to relates a code to a parent code code Code Association Code parent code Properties of the association groups codes with the same parent Property Code Composition value based hierarchy has code groups comprises code groups Hierarchy level based hierarchy has formal levels Level
Item Scheme Maps • Many types of “item scheme” use the same fundamental structure • Code list • Category scheme • Concept scheme • Two Item Schemes can be mapped
Association Role Concept Scheme Category Scheme Concept Scheme Category Scheme Code List Code List Concept Category Code Concept Category Code target item scheme Item Scheme Association source item scheme Category Scheme Map Concept Scheme Map Code List Map Item Scheme Item Scheme has item associations Item Association target item source item Item Item Additional metadata Property
Structure Maps • Structures can also be mapped • Data structures • Metadata structures
Information Model: Summary • Supports data and metadata reporting and exchange • Data and metadata structure definitions • Data and metadata sets • Supports the process of reporting and exchange • Data/metadata providers • Data/metadata flows • Provision agreements • Supports registration • Data and metadata sets • Data and metadata can be linked • Supports query • Categories linked to data and metadata • Constraints for finer grained queries • Retrieval of metadata linked to data • Supports data analysis, comparison and conversion • Hierarchical code schemes • Structure, Concept, Code, Category maps
Data/Metadata Reporting, Query, Analysis, Mapping CategoryScheme Structure and Item Scheme Maps Structure Definition Data Set or Metadata Set Data or Metadata Flow Category Attachment Constraint Content Constraint Provision Agreement Data Provider Registered Data Set or Metadata Set