1 / 40

Item 3.3.2 of the Agenda

Item 3.3.2 of the Agenda. CVD Antonio Consoli - Eurostat B1. Content. Intro CVD Architecture CVD Components Production systems Overview 2011. Intro. CVD = Statistical Business Process Model (in French = Cycle de Vie des Données)

marytjones
Download Presentation

Item 3.3.2 of the Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Item 3.3.2 of the Agenda CVD Antonio Consoli - Eurostat B1

  2. Content • Intro • CVD Architecture • CVD Components • Production systems • Overview 2011

  3. Intro CVD = Statistical Business Process Model (in French = Cycle de Vie des Données) The new CVD project was launched in 2004 with 2 main objectives: • Rationalisation of existing IT systems. • Harmonisation of IT architecture.  The gradual implementation has now started

  4. Why? • Avoid stove pipe processing allowing synergy / economies of scale • Simplify the statistical process, increase the level of automation and integration of systems • Free resources (money + human) from IT to core business • Ease mobility and backing-up of staff • Achieve higher quality of common components • Simplify data exchange and quality of data exchanged with Member States

  5. How? • Put together statisticians and IT experts • Gradual implementation through use of opportunities • Follow the CVD architecture / guidelines • (Re)-use (generic) software (e.g. BBs) • Harmonise and rationalise systems  Create economies of scale  Make maintenance of organisation software simpler

  6. Statistical application Statistical application TDS MH MH CVD Manager CVD Manager CVD Manager CVD Manager CVD Manager CVD Manager SEP SEP BB BB ’ ’ s s Reference Reference NUI NUI CVD Services CVD Services CVD Services Functional view

  7. Reference data Data in production User support ASSIST Reception Storage METHODOLOGY + QUALITY ASSURANCE + METADATA MANAGEMENT Statistical Business Process Model view Metadata Handler Data files Internet Portal Production system Data Explorer eDAMIS BB BB Manager COLLECT VALIDATE ANALYSE DISSEMINATE

  8. Statistical Business Process Model (data processing) Collect 1 Validate 2 Analyse 3 Disseminate 4 Set up collection 1.1 Edit 2.1 Produce products 4.1 Acquire domain intelligence 3.1 Run collection 1.2 Detect & treat outliers 2.2 Produce statistics or indicators 3.2 Manage customer queries 4.2 Load data 1.3 Impute 2.3 Check quality 3.3 Derive new variables 2.4 Interpret and explain 3.4 Cooperate with providers Metadata management 5 1.4 Integrate and load data 2.5 Prepare tables for Dissemination 3.5

  9. CVD components & Statistical Business Process Components especially designed for the sub-process Components designed for other sub-process but could be used for this sub-process as well if the functionalities are appropriate Other uses may be possible in specific cases

  10. Relation table of Statistical Business Process Model & CVD components

  11. Component concept CVD is not a monolithic system to be used in all statistical domains but it is composed of: • Limited set of production systems • Set of generic specialised components, which can be used by the production systems • Guidelines for implementation, development, data exchange

  12. Components Generic tools Production systems • CVD systems • GSAST • NAPS • COMEXT • Existing systems • SAM • Eurocube • FAME • eDAMIS • BBs (Building Blocks) • MH (Metadata Handler) • EUROBASE (common reference database) • Data Explorer

  13. eDAMIS • Supports the transmission of statistical data between Member States and Eurostat • Provides acknowledgements of data arrival • Automatic reminders for late data • Ensures secure and well monitored transmission of data through SEP • Delivery of data to production environments • User access management • Links to structural metadata • Automatic generation of SDMX-ML messages for online data transmission • Handles standardised messages • Basic validation in both interactive and batch mode & format conversion, to converge in future: • On eDAMIS data files field, some intra-record and limited inter-record checks • Validations in Web Forms which apply directly on cells • Dataset occurrences received through SEP for EU 27: 2005 26% 2006 32% 2007 38% 2008 (Q1) 43%

  14. eDAMIS - Plan • Status: v. 2.6 in production. 43% of data sets received in Eurostat through it. • September 2008 v.2.7: • Improvement in performances • Validation and conversion of SDMX-ML (as currently for GESMES) • Workflow manager can receive signals from other applications • March 2009 v. 3: • Dataset inventory, validation engine, further integration of Stadium db • 2010: • Pull approach fully supported • ECAS sign in on internal and external users, link to CVD-MH

  15. Editing BB (EBB) • Executes editing rules optionally with reference data (lookup tables) • Intra-cell, intra-record (horizontal) and inter-record (vertical) rules • Reports on the rules execution • Allows interactive review of messages • Can be provided to MS for editing at source as it is: • Generic: we evaluate that the system would be sufficient for most of the statistical applications in any statistical office. • Portable • Generic package or specific application of the software can be shared among interested parties. • It can be downloaded or distributed on a CD-ROM. • Written in Java so it can run on PC, Mac or any Java compatible environment. • Individual parameters can be customised to best suit specific needs: • Update / change of edit rules • Update / change of classifications • Update / change of other execution parameters • The parameter updates / changes can be also downloaded. • Possible to make and distribute a non modifiable version for specific purpose.

  16. EBB - Plan • Status: in production for AES, CVTS, External Trade • 2008: ICT Household survey, CIS • 2009: ESSPROS, Energy, SBS, BOP, Migration and new GSAST domains

  17. Derivation BB (DBB) • Derives new variables optionally with reference data (lookup tables) • Intra-cell, intra-record (horizontal) and inter-record (vertical) derivations • Reports execution • Allows interactive review of messages • Uses the same engine (subset) as editing BB • Status: under development • End 2008: First version ready

  18. Outliers Detection BB (ODBB) • Basic and statistical methods to identify outliers • Methods: • Hidiroglou-Berthelot and σ-gap • top and bottom – number or percentiles and conditions • Reports on the execution • In future multidimensional distance measures • Status: in use for Urban Audit • 2009: First implementations in agriculture and health statistics

  19. Disclosure Control BB (DCBB) • Performs confidentiality verification of tables • Applies various masking techniques assuring confidentiality of published statistics • Based on CSB μ-argus and τ-argus • Status: Partially tested for SBS • End 2008: Link to GSAST • End 2009: SBS

  20. Economic Indices BB (EIBB) • Fisher • Laspeyres (Geometric) • Paasche (Geometric) • Törnqvist-Theil • Laspeyres (harmonic) • Paasche (harmonic) • Chain index • EKS(-S) Calculates indices used in economy • Weighted arithmetic mean • Weighted geometric mean • Weighted harmonic mean • Laspeyres • Paasche • Lowe • Edgeworth • Bowley Status: ready for implementation, waiting for first requests

  21. Imputation BB (IBB) • T.b.d. note: possibly based on BANFF software, any system should be really very similar to BANFF • Implementation of various mathematical imputation methods • Last BB to be developed • Scope not yet established • Status: BB survey confirmed need for it • Plan: start analysis in 2009 • End 2010: alpha version

  22. Seasonal adjustment BB (SABB) • Methodology draws on Demetra + that is under development • based on X-13 and Tramo-Seats core engines • to be developed : diagnostics, reconciliation, aggregation, etc. • organisations involved: ESTAT, Banque Nationale de Belgique, US Census Bureau, Banco de España • SABB • specifications methodological and IT architecture based on Demetra + towards the end 2008 followed by development

  23. ASSIST BB • User support tool • Parallel to e-mail system (with attachments) • Service request • Request follow-up • Searchable, central public knowledge database • Decentralised help centres / persons • Sub-systems by subject matter, geography or any other classification • Access management (to appropriate parts of the system by administrative privileges or subject matter) • Status: implemented for External Trade • 2008: • Implementation B6 • Implementation in MS

  24. MH - Metadata Handler Integrated environment enabling the management of structural and reference metadata in Eurostat Covers: • Structural metadata: data and metadata structure definitions, code lists, classifications… • Reference metadata: SDDS and ESMS metadata, quality reports… Provides • Human user interfaces for viewing, creating and modifying metadata • Interfaces to other applications so that other applications can upload and retrieve metadata • Export and import of metadata • Common user access control for all metadata operations Enables • Coherent, reusable metadata across domains and through different stages of the data life cycle • Status: v1 under development (extension of the SDMX registry). • End 2008: v1 in production, with two human interfaces plus Web Services for applications. Link to GSAST. • End 2009: v2 first release: Partial integration of EMIS and /or RAMON and CODED. • June 2010: v2 second release: full integration of horizontal metadata management

  25. EMIS (Eurostat metadata information system) • Supports the preparation and administrationof reference metadata • Status: v 2.1 in production, manages SDDS files • Mid 2009: v 3 in production, management of ESMS (Euro SDMX Metadata Structure)

  26. MANAGER The process monitoring and / or scheduling tool related to the production system. The scope can vary depending on the production system: • COMEXT is a tightly integrated system with minimal human intervention that does not require an external process management tool. It launches automatically processing steps and includes all the status information based on a design of the particular production process. • NAPS - there is no a priori defined process so a generic process scheduler can not be applied. Monitoring and reporting tool is foreseen. • GSAST: process management is native to SAS Enterprise Guide. Next version will have the possibility of conditional launching of process steps based on the result of predecessors. • 2009: Automatic process scheduling in GSAST (New EG in SAS V9.2) • 2011: Monitoring and reporting tool for NAPS

  27. EUROBASE New reference environment, to replace NewCronos. • Status • under development and parallel running. • 2009 • Java version of the user interface • replacing NewCronos

  28. Data Explorer To provide access to the statistical reference databases of Eurostat. • embargo • single tool for all data and metadata based on Comext API and DB • based on the principles of graphical tools • highly interactive operation • metadata is presented to the user • shows relation of different types of metadata • can be used inside Eurostat • Status: under development and testing. • v2.1.4 ‘Try This’ is available for user testing on the Internet. • September 2008: v2.2 full operational deployment. • 2009: v3 integration of Table, Graph, Maps interface (TGM)

  29. GSAST • Primary target - treating micro-data and operations of micro and macro-data from surveys • Based on SAS base, BI server and Enterprise Guide • For unique or unusual processing requirements

  30. GSAST - Plan

  31. COMEXT • Tightly integrated production and dissemination environment with a wide range of generic statistical analysis functions and a powerful metadata management • Accommodates very large data volumes • Methodologically coherent approach assuring maximum of security and data integrity • Assures timely production • Status: in production for external trade statistics, part of food safety. Dissemination data base, embargo, versioning and extraction and statistical calculation facility behind Data Explorer • December 2009: energy statistics, Esspros

  32. NAPS • NAPS = National Accounts Production System • System to target sub annual (seasonal time series) oriented data. • To allow easy and direct interaction with atomic data on cell or time series level: flexible approach, users can define their own calculations using high level MDT language and Oracle statistical functionalities. • Status: MDT in production for BOP in unit C4 • 2009: • Detailed analysis of domains to migrate • Details of system designed • 2010: • Pilot domain migrated • Start migration of dir. C applications

  33. Other current production system: SAM, Eurocube, FAME SAM • Most simple and straight-forward tool. • Low-complexity, self-contained Microsoft Windows based tool (Visual Basic on top of Oracle) designed specifically for applications that it is preferable to be self sufficient. Eurocube • It has a similar functionality to SAM. However it targets more complex multi-dimensional applications requiring assistance from IT experts. • It is based on Oracle Express – Oracle OLAP technology. FAME • FAME is a specialised database system with a wide range of functions for time series storage and treatment. • A number of complex and mission critical statistical production systems have been developed using the FAME development language. Status • Look for possibility of using BBs, started work on SAM link with EBB. • Support and maintain while studying future migrations.

  34. Special applications Current plan covers the majority of present production systems but special applications can exist outside CVD, such as: • LUCAS (statistics on land cover and land use): Uses special software image processing and spatial analysis. Terabytes of data. • Euro Business Register: First application based on registers in Eurostat. • GISCO (Geographical Reference Database for European Commission): Uses special software for geographic information systems.

  35. Practical steps in implementation of CVD • Looking for opportunities - gradual migration to CVD • Proliferation of BBs in existing applications • Communicating the CVD in Eurostat • The CVD beyond Eurostat

  36. Looking for opportunities - gradual migration to CVD • When the need arises in a production unit to migrate or implement a new application the procedure will be the following: • Production unit express the need to their regular contact in unit B1 during the IT Masterplan (Schéma Directeur) exercise; • If needed request is discussed in CAB/ITSC; • Solution and plan is proposed to production unit. • The choice of the production system is done taking into account: • Technical aspects linked to the data production; • Easiness to implement a solution; • Time constraints; • Human resources: • Availability of developers; • Tools already used in unit, directorate ; • How does it fit with other ongoing and planned projects; • Compatibility with CVD strategy.

  37. Proliferation of BBs in existing applications For all new requests whenever possible, (i.e., desired functionality exists) available BBs are interfaced with specific statistical production applications to be used in data processing.

  38. Communicating the CVD Objectives • At technical level • Appropriate the CVD approach • Understand the CVD architecture • At managerial level • Get support in the implementation process Means • At technical level • CVD seminar • Ad-hoc lunchtime presentations • At managerial level • ITSC (IT steering committee) • HUM (Head of Unit meeting)

  39. The CVD beyond Eurostat Rationalisation of statistical information systems is an objective of many NSI • One session on architecture in the UNECE, Eurostat, OECD, MSIS meeting • One session on rationalisation in last ITDG • Implementations in Ireland, New Zealand and Latvia • Developments in many countries Opportunity to share components as a next step • SDMX • open (community) source tools • MSIS TF on tool sharing • Standardisation initiative in the ESS

  40. Overview 2011 • Single components are used for: • Data exchange: eDAMIS • Management of metadata: MH • Specialised statistical processing components: BBs • Reference: EUROBASE • Dissemination: Comext and Data Explorer • CVD production systems cover at least: • Applications in current dir. F and microdata treatment: GSAST • National accounts applications: NAPS • External trade and energy statistics: COMEXT • Current systems are maintained and evolving: • Linked to BBs: SAM and Eurocube • Based on the results of current migration plans: FAME

More Related