170 likes | 188 Views
Explore, select, and retrieve statistical data from registered sources using a single, unified data catalogue. Analyze the feasibility and architecture of the UDC concept.
E N D
An SDMX based unified data catalogue (UDC) MSIS – Meeting on the Management of Statistical Information Systems Gabriele Becker / Massimo Bruschi Statistical Information Systems Monetary & Economic Department Bank for International Settlements 1
The SDMX vision • Need: up-to-date numbers, data documentation, good quality data • Data can be offered by: NSOs, CBs, IOs • How to choose, filter out duplication, get the “fresher” ? • Data providers (originators) offer their data “in SDMX” • Dissemination = reporting = data sharing… single storage ! • SDMX registries help users and organisations to find data • How “real” is this SDMX vision? • What do we still need to learn?
The Unified Data Catalogue (UDC) concept • Can we “implement” the vision ? • UDC: a single data catalogue that allows to discover, select and retrieve statistical data from all registered data sources • discovery implies access to metadata: • DSD – data structure definitions • concepts and code-lists • category schemes • An SDMX registry is a natural repository • Unified Data Catalogue feasibility study to analyse this
UDC study: Objectives • Provide centralised access to a variety of internal and external data-sources • Generic search facilities against “registered” data sources • Directly retrieve data and metadata from all data sources • Use SDMX technical standards, SDMX registry, web services • Broaden SDMX knowledge within BIS (business area and IT colleagues)
User stories • Registrations • Constraints • GUI features • Navigation / Search • Query & retrieval • Output handling • Automation • Security
UDC prototype architecture • Simplistic approach: to search and retrieve data from a data source all what we need to know are the data structures and the source query language • If a source follows the SDMX-IM we also need a (web) service connected to it able to respond to SDMX Query • SDMX-enabled data source: “native” or “adaptable” • SDMX-ML file + DSD + “file-query-handler” = simplest SDMX enabled source
SDMXfiles SDMX data sourceweb-service mappable data source web service SDMX Registryweb appl. Plan: schematic architecture Internalorexternalsources SDMXquery adapter web service Registrations SDMX UDC GUI
Components of the UDC prototype • SDMX Registry (“off the shelf” SDMX Tool) • Data structure definitions of all “connected” data sources • Registrations for all data flows for all connected data sources • URLs to SDMX-files and SDMX query services • Updated via SDMX-ML messages or interactively (“KeyMaster”) • UDC (developed for the study) • GUI to navigate the registry information • Queries the data sources • Retrieves data and presents them to the user • SDMX query web services (developed for the study) • For the different types of data sources • Data query services (partly existing, partly developed) • For each of the connected queryable data sources
medts.aLinux BIS Data Bank DBQL output SDMX-MLproxy daemon .xml .xml .xml .xml What we did: detailed architecture mstat.aWin mstat.sWin v.ds03Linux MSTAT Cubes MarkIT SQL database SDMX-ML data files TS web service SQL storedprocedures SDMX-MLquery web service/databank/query SDMX-MLquery web service/mstat/query SDMX-MLquery web service/markit/query SDMX Registryweb appl. UDC web appl. SDMX-MLfilebrowser R/O Registry PCWin Internet ExplorerUDC GUI
UDC GUI key features • Browse the Categories / Data-flows / Provision registrations • Browse selected DSD: dimensions, attributes, code-lists • Build queries based on DSD (code selection) • Run query and view results (simple table) • Download results and DSDs in SDMX-ML format • Search by Concept / Codelist
UDC Prototype: some results • UDC can provide (unsecured) access to • BIS Data Bank: time series repository, SDMX-EDI IM, LINUX, FAME, Sybase, own query language + query adapter • MSTAT OLAP: IBFS data multi-dimensional cubes, MS Windows, SQL Server, SDMX Query to OLAP / MDX adapter • MSTAT Sandbox, research data in relational base, MS Windows, SQL Server, DSD on unstructured dataset + SMDX / SQL adapter • SDMX-ML generic files + generic file adapter • Practical use of registration, provisioning, constraints processing, … • SDMX vision is real … with some practical issues
Issues found (Aug. 2009, SDMX 2.0) • Not possible to register compact or utility files in registry used • Not possible to register files using message groups and annotations as not supported in registry used • Missing functionality in SDMX Query message • Some issues with registry implementation used • Constraints processing on registry did not work • ECB does not provide DSDs on their website (files are OK) • Cross-platform communication with security not solved • In general: access authorisation to query-able data sources is unresolved
Conclusions • SDMX vision is real: the UDC works • Enhancements to standards already part of SDMX 2.1 • Enhancements to registry implementation (eg industrial strength required) • Non-SDMX issues (cross-platform connectivity and access authentication) exist and need to be looked into • Current SDMX offerings from other organisations are rather diverse (message types, features used, version implemented) • Diverse offerings make requirements for a UDC more complex
Next steps for the BIS • UDC can be a central part of future BIS environment • Road to UDC will take a few years • Continue the feasibility study in the next year • Refine UDC • More data sources • More user facilities for search and navigation • Work with SDMX standards experts on issues found • Work with other SDMX data providers
Thank you! gabriele.becker@bis.org massimo.bruschi@bis.org