190 likes | 454 Views
An integration approach for the Statistical Information System of Istat using SDMX standards. Francesco Rizzo (ISTAT - Italy ) Stefano De Francisci (ISTAT – Italy ). GENEVE 08 -10 May 2007 Meeting on the Management of Statistical Information Systems. Summary.
E N D
An integration approach for the Statistical Information System of Istat using SDMX standards Francesco Rizzo (ISTAT - Italy) Stefano De Francisci (ISTAT – Italy) GENEVE 08 -10 May 2007 Meeting on the Management of Statistical Information Systems
Summary Istat Information System (current situation) The Integrated Output Management System Planning constraints; strategic plan Standardizing new sub-systems through the toolkit Integrating existing sub-systems through SDMX Conclusion Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Istat Information System Current situation: the statistical production activities of Istat are supported by a distributed architecture. Several production Directorates operate through local subsystems that, independently, cover the full life cycle of statistical data, from collection to dissemination The mission is: to improve and standardize processes in part of the life cycle of statistical data from validated data to dissemination, through the integration and management of data and metadata supplied by production Directorates. Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
metadata The simplified current ISTAT scenario Production Directorate Production Directorate Data Collection Data Collection Data Editing Data Editing Data Aggregation Data Aggregation thematic DB thematic DB validated microdata web navigator web navigator web navigator Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Istat Information System • Some numbers: • 7 production Directorates • 3 horizontal-competence Directorates • 18 dissemination databases accessible by Internet • 2 centralized metadata systems • Used software and tools: • Unix – Windows • Tomcat, Apache, IIS • VB, Java, Sas, Excel • .NET, JSP, PHP, ASP • Oracle, Access, Postgress, mySQL Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System The Integrated Output Management System is a project oriented towards the standardization and integration of a part of the life cycle of statistical data, particularly, all the steps need to produce purposeful statistical outputs for end users. The high level of existing applications and technological heterogeneity of the involved systems have precluded a full integration. Consequently the Integrated Output Management System has been configured as a multi-level and a multi-service integration environment Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System • The project’s guidelines: • identify the right position inside the Istat Information System • find the right mediation among the points of view of the different Directorates • choose the right compromise among standardization, reengineering and integration Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
metadata Position of Integrated Output Management System in the Istat scenario Production Directorate Production Directorate Data Collection Data Collection Data Editing Data Editing Data Aggregation Data Aggregation validated microdata thematic DB thematic DB Integrated Output Management System web navigator web navigator web navigator web navigator Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System • Planning constraints: • use expertise acquired in some projects developed in the last few years: Metadata Information System, generalized environments performing OLAP functions, thematic databases; • use new technologies like XML and Web Services alternatively to the “proprietary solutions”; • develop a new “integration culture” which refer to new sub-systems’ planning stages • minimize the impact on the existing sub-systems; • minimize the costs and risks through a “gradual strategy” of development; Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System • Minimize costs and risks: the two different architectural approaches • to develop a complete framework that allows the standardization of all the processes from validated data to dissemination through the use of a toolkit that allows the building of new sub-systems or the reengineering of existing sub-systems • to build a SDMX architecture, using a Registry and Web Services, that allows the integration of existing thematic databases Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System • Minimize costs and risks: the two main phases of the project • short-term • feasibility analysis • planning • prototyping • training on integration technologies and standards • build frameworks • medium-term • stimulate the use of the frameworks as means to standardize and integrate sub-systems • guide the planning of new dissemination systems • build the SDMX infrastructure integration system Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System (1/2) • Top management strategic plan • build up new Directorate for “Information needs, Integration and Territory” (DCET) with the main objective to guide integration processes inside the Institute. • two Unit of DCET Directorate are involved directly in the “Integrated Output Management System Project”: • “Unit B” has the task of developing a toolkit that enables production Directorates to integrate new dissemination sub-systems that are going to be planned. • This Unit will supply expertise with particular reference towards cross-sectional data and OLAP applications Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
The Integrated Output Management System (2/2) • “Unit A” has the task of testing fairly new technologiesin Istat, like XML and Web Services, and studying SDMX standards. • This Unit will supply the expertise to integrate existing thematic databases with particular reference towards short term statistics and time series • building up an internal inter-Directorates working group with the aim of supporting the Eurostat SODI task force • building up an internal inter-Directorates working group whose main objective is to analyze and verify the use of SDMX standards in the Istat Information System architecture. Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Integrated Output Management System zooming macrodata validated microdata metadata ETL TS loader CS loader DW microdata thematic database aggregator WS Registry DW macrodata SDMX web service web portals time series web portal multidimensional web portal SDMX web portal Standardization and reengineering through the toolkit SDMX integration of existing thematic databases WS web service; TS time-series; CS cross-sectional; DW data warehouse Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Standardizing new sub-systems through the toolkit • The toolkit allows the production Directorates to be self-sufficient in building statistical Data Marts as part of the Institute’s most complete Corporate Data Warehouse. • The functions available are: • integrate through a specialized layer with the centralized metadata systems • carry out Statistical Data Mart validated microdata oriented to a specific subject matter domain • build a primary Data Warehouse of validated microdata • build a Web Warehouse of aggregated data Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
Integrating existing sub-systems through SDMX In order to facilitate the necessary support to the SODI task force and to develop best practices on SDMX, we are developing several software modules organized in a framework. The framework could be used entirely from reporting to dissemination, or alternatively using modules separately, integrating them into each sub-system. Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
SDMX Istat Framework 1/2 • SDMX Istat Framework version 1.0 is composed by the following modules: • Check and Loader: • collect and load aggregated data in the database; • publish a RSS file that inform when new data is loaded or updated; • publish one or more SDMX Query file(s); • publish one or more SDMX Compact file(s); • SDMX data Web Service • allows the use of the Pull exchange method to request data • accepts a SDMX Query • responds with a SDMX Compact Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
SDMX Istat Framework 2/2 • SDMX Web Navigator: • is a web application that acts as a client towards the web service; • allows to query the database using DSDs as analysis dimensions; • allows building of SDMX Queries using a graphic interface; • allows testing of SDMX Queries; • Manager and Web navigator Reference Metadata: • allows production Directorates to produce Reference Metadata in SDDS format without modifying current working ways Geneva 08-10 May 2007 Meeting on the Management of Statistical Information Systems
An integration approach for the Statistical Information System of Istat using SDMX standards • Conclusions • standardization and integration are now possible to carry out easily through new technologies like XML and Web Services • the full success of the project will depend on: • top management strategic plan • the right position inside Istat Information System • the right compromise between standardization, reengineering and integration • management of the introduction of the result systems without a traumatic modification of the current working ways GENEVE 08 -10 May 2007 Meeting on the Management of Statistical Information Systems