180 likes | 348 Views
EUROPEAN TECHNICAL ASSISTANCE PROGRAMME FOR VIETNAM (ETV2). Ministry of Planning and Investment, Ministry of Finance and Ministry of Science and Technology in partnership with the European Commission VNM/AIDCO/2002/0589.
E N D
EUROPEAN TECHNICAL ASSISTANCE PROGRAMME FOR VIETNAM (ETV2) Ministry of Planning and Investment, Ministry of Finance and Ministry of Science and Technology in partnership with the European Commission VNM/AIDCO/2002/0589 SDMX in the Vietnam Ministry of Planning and Investment-A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making in the Ministry of Planning and Investment
ETV2 Component 5 • Goals • Assist Ministry of Planning and Investment (MPI) to improve monitoring, analysis and decision-making capability • Focus on access to and quality of MPI information • Major data-related Issues • Diverse data from multiple sources • from provinces, other ministries, businesses, etc • inconsistent formats, definitions • overlapping data indifferent areas of MPI • No facilities to share and reuse data • poor data and metadata management • no central storage or registration of data
We are evaluating an SDMX-based solution • To hold and manage the metadata • mostly “structural” metadata in SDMX terms • but “reference” metadata will be added • To link the data to the metadata • To provide an environment where the data can be managed in the context of the metadata • for capture and storage • for searching and browsing • for retrieval and access
Why SDMX? • This does not seem to be its home territory • data and metadata exchange • the data does come from lots of other organisations • but this is not the focus of the current work • MPI has a pool of unmanaged, mission-critical data • with very little management of structural metadata • and with virtually no reference metadata • although plenty of need for it
Why SDMX? • The SDMX Conceptual Model is essentially about linking metadata to data • this model provides • a framework for sharing and re-using structural metadata • a context within which data can be managed and used, via the structural metadata • a context within which reference metadata (eg quality information) can be introduced and managed • SDMX also provides basic tools to support use of this Conceptual Model • It is a considerable challenge to build these features into most other approaches • you have to devise and support your own SDMX-like model
The MPI Data • Regular tables (and other material) • supposed to come regularly from the various source organisations • mostly do not come from all in timely fashion • little control over how they come • usually email, often paper! • may come as different versions at different times! • quality and version status is an issue • MPI works in Excel • data that comes electronically generally comes in Excel • where “metadata” exists it exists as Excel templates • mostly stored on local machines • hard to share within MPI departments, impossible to share across them • much of the data has security management requirements
SDMX from the Conceptual Model Perspective • SDMX is usually described from a data exchange perspective • the terminology is a bit abstruse • the UML makes a developing a detailed understanding a challenge • I like to look at SDMX as a Conceptual Model • and to relate all the SDMX jargon to important concepts in the conceptual model • I have a few slides to explain the model • and to show how we might use it at MPI • still work in progress
Concept Scheme Category Scheme Concept Category Metadata Structure Definition Code List Data Set Data Flow Data Structure Definition Hierarchical Code Set (Classifications) Data Provider Provision Agreement Metadata Flow Metadata Set The major SDMX artefacts Structural Metadata Structural Metadata contains information used to structure data and metadata. This covers Concept and Category schemes, Code Lists and Hierarchical Classifications, and Structure Definitions for Data and Metadata Sets. The basic structural metadata components are the Item Scheme and the Structure Definition. Organises simple code lists into hierarchies. Concepts are organised into Concept Schemes Codes and their names and descriptions. Possibly multiple category schemes to categorise flows and allow searching and indexing of data and metadata flows Structures are built from Concepts and associated Code Lists that define the valid content Data and Metadata Flows are linked to a structure that defines the format of the corresponding Data Sets and Metadata Sets Potentially many data sets from many providers conforming to structure of data flow Reference Metadata Reference Metadata is non-structural metadata that gives more information about an object to make its interpretation more meaningful. Quality, methodological, and conceptual information are examples of Reference Metadata. SDMX allows Reference Metadata to be published, shared, and reused. Provision Agreements are agreements by providers to deliver data to a schedule according to a flow structure Flows – the heart of SDMX Data The observed phenomena at specific points as identified by the values of concepts comprising the key of the observations. Data Provisioning Metadata
This is a description of an data or metadata “flow” – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources) The SDMX top-level model
This is an instance data or metadata set from a particular provider at a particular time, eg, a particular table from Ninh Binh province, for a particular period This is a description of an data or metadata “flow” – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources) The SDMX top-level model
This is an instance data or metadata set from a particular provider at a particular time, eg, a particular table from Ninh Binh province, for a particular period This is a description of an data or metadata “flow” – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources) The SDMX top-level model Provision Agreements indicate what Providers will provide what subset, when, how often, and how
This is an instance data or metadata set from a particular provider at a particular time, eg, a particular table from Ninh Binh province, for a particular period This is a description of an data or metadata “flow” – an abstracted data or metadata set that will potentially occur for many periods and from many providers (eg a regular table received by MPI from various sources) The SDMX top-level model This identifies the Data Providers, giving indicative and contact information and linking to Provision Agreements and actual data and metadata sets Provision Agreements indicate what Providers will provide what subset, when, how often, and how
This describes the structure of the data or metadata flow – all the metadata needed to request and understand an instance of the flow (an actual data or metadata set). Links to all other structural metadata. The SDMX top-level model This categorises all the defined data and metadata flows, providing a structuring framework and a basis for searching. Links to other structural metadata.
SDMX at MPI • What we envisage is • an SDMX Registry/Repository • code sets and classifications • environment for standardising and harmonising • Data Structure Definitions for all the regular data sets received by MPI • the “Data Flows” • Categorisation schemes to index the Data Flows • a data storage environment to hold the Data Sets • initially probably a simple file store • possible a database store • possibly a star-schema store • with star schema design generated automatically from structural metadata • provides options for different “cuts” through data flows
SDMX at MPI • What we envisage (cont) • intelligent interfaces to Excel • using the structural metadata • to support • browsing and retrieval of data • automatic generation of Excel templates • data capture, registration, and management • structural metadata browsing and management • reference metadata definition and management • reference metadata attachment
More Information • In the workshop sessions • BryanMFitzpatrick@yahoo.co.uk