200 likes | 355 Views
Application of Service Oriented Architecture in Statistics New Zealand. UNSC Modernisation of the Statistical Process Seminar New York, February 24, 2010 Geoff Bascand & Matjaz Jug. Drivers for IT Architecture.
E N D
Application of Service Oriented Architecture in Statistics New Zealand UNSC Modernisation of the Statistical Process Seminar New York, February 24, 2010 Geoff Bascand & Matjaz Jug
Drivers for IT Architecture • Agility: transformational changes like shift towards the increased use of administrative data, more automated data processing etc. • Cost & Reuse: standardisation and reducing high costs of development and maintenance of statistical production systems. • Integration: need for integration of outsourced statistical tools and legacy application assets • Configuration: response to frequent changes in data sources, questionnaires, methodology and classifications.
SOA Definition • The Open Group describes Service Oriented Architecture (SOA) as a: • “style of IT architecture that delivers agility and Boundaryless Information Flow™. It is deployed on an increasing scale in enterprises today.” • SOA is a message-based, independent component architecture where: • communication between components is managed by a “service (or process) manager” that mediates communication, coordination and cooperation among components through messages. The message carries data and process data.
SOA Benefits • Increased agility: organisations should be able to more quickly respond to changes in business process and external environment. • Reduction of cost through reuse: new IT systems should be able to leverage the most readily available code and services from across the organization and externally. • Better possibilities for integration using loosely coupled framework and orchestration. • Configuration rather than programming
Situation in Statistical Organizations • Many lessons learnt from early adopters • Even now there are not a lot of statistical organisations implementing SOA on a large scale • We are “behind” compared with some other sectors like the Airline Industry • WHY? Are we really so different?
In Some Areas We are Different! • Many semantically diverse data structures • Frequent change in data structure, sources, questionnaires • Specific requirements like data confidentiality • Many stove-piped legacy application assets • Mainly non-transactional processing • End-user processing environments
Learning from Data Warehousing and Metadata-Driven Projects • High degree of organisational change is required which is usually slow process. • It is difficult to establish new governance. • New architecture usually requires complete replacement of legacy application assets portfolio. • Software development capability is difficult to upgrade and maintain in-house • Common challenge organisations often face involves effectively managing metadata. • Lack of standardisation – it appears every new paradigm requires more of it.
Additional lessons from early SOA attempts • Standardisation of services and data structures is vital • Too broad a business or services scope, then costs of generality & development are high • Too specific a service or business request, then benefits of re-usability are limited • Performance degrades with volume
Architecture in Stats NZ now – Platform approach and Shared Services (SOA) STATISTICAL INFRASTRUCTURE D I S S E M I N A T I O N C O L L E C T MICRO - ECONOMIC MACRO-ECONOMIC SOCIAL CENSUS IT INFRASTRUCTURE
Statistical Infrastructure Frames and Registers Classification Management Metadata Management Methodologies Collection Processing - Micro Economic Statistics Dissemination Platform for Micro economic statistics (BESt) Administrative Data Content Management (www.stats.govt.nz) Other systems (mostly legacy) Respondents & Collection Management Data Dissemination Management CAPI Table Builder Processing - Macro Economic Statistics Platform for National Accounts (DNA) CATI Infoshare Other systems (mostly legacy) Imaging Business Toolbox Processing - Social/Household Statistics Platform for HH statistics (POSS) Future (Web) Future Other systems (mostly legacy) Census Platform IT Infrastructure Hardware Server Software (OS, email, SQL DB, OLAP, CRM, CMS) Applications & Tools Desktop Software (MS Office, Lotus Notes)
SOA in Data Collection • Description: data collected through CATI, CAPI and Imaging are loaded (pushed) using messaging infrastructure to production databases. The grain is individual questionnaire response. Load service was built to deliver data to Legolution and POSS Input Data Environment (now Social Input Store). • Challenges: We have dropped this approach in Process phase due to difficulties in moving large amounts of data as a messages. Requirement to pass process-metadata was overlooked so additional metadata transfer had to be used • Benefits: infrastructure required for transactional data collection where every response can be pushed to production systems. This approach is anticipated as a result of Standard Business Reporting project.
SOA in Data Processing • Description: Data is now transferred using ETL packages (pull). Service is used to initiate ETL packages. Configuration store is a central place where process is configured (metadata) and is currently used by two systems: BESt platform and SOFIE processing system. • Challenges: Reuse of ETL packages is limited to the single platform (BESt) but some components (configuration store) can be used by other systems as well (as part of statistical infrastructure). • Benefits: Highly configurable process workflow enabling WHAT-IF scenarios.
SOA in Data Dissemination • Description: Dissemination tool Business Toolbox is using SDMX query service to get aggregated data from dissemination data warehouse OECD.stat and present it in a customized user friendly way. • Challenges: integration of data warehouse with output production (legacy) systems. • Benefits: Presentation of information is not dependent on the physical structure in data warehouse, possibility to easily add new SDMX-based web components as well as new data.
SOA in Statistical Infrastructure • Description: Coding is the first example of statistical infrastructure to be offered through the service interface (internally and externally). CCS coder will offer automated coding service based on classification metadata in CARS. • Challenges: metadata management & standardisation. • Benefits: Statistical infrastructure (metadata management systems, registers) can provide services to internal and external platforms and individual systems.
How to Start? Areas Where SOA Can Deliver Significant Value • Metadata services: a good candidate for reuse in many stovepipe and corporate applications. • Statistical tools/components: making them more interoperable using service interface would significantly improve the possibilities to integrate them in different IT environments and therefore increase their shared usage and collaboration.
Summary • Iterative development (low hanging fruit first) and proofs of the concepts • No emphasis on any particular approach: SOA, DW and metadata-driven architecture are used together in a way which maximizes benefits and minimizes risk • Strong focus on use of standards (SDMX) • Common IT Infrastructure is enabling additional consolidation (MS SQL Server & Analysis Services, SAS Server, Blaise, .NET)
Annex: Systems Architecture and SOA Use – Detailed Version • The following slide is a detailed picture of our systems architecture • Box 1 highlights SOA in the collections area • Box 2 highlights SOA in the processing area • Box 3 highlights SOA in the dissemination area • Box 4 highlights SOA in statistical infrastructure