250 likes | 362 Views
Statistics: Investment in the future 2 Prague 15 September 2009. Processing and managing statistical data: a National Central Bank experience. Fabio Di Giovanni Banca d’Italia fabio.digiovanni@bancaditalia.it Daniele Piazza Banca d’Italia daniele.piazza@bancaditalia.it.
E N D
Statistics: Investment in the future 2 Prague 15 September 2009 Processing and managing statistical data: a National Central Bank experience Fabio Di Giovanni Banca d’Italia fabio.digiovanni@bancaditalia.it Daniele Piazza Banca d’Italia daniele.piazza@bancaditalia.it Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 1
Outline: • Context and requirements • Foundations of the statistical information system: the Information Model and the IT architecture • The representation of data transformations • The calculation services Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 2
Institutional statistics Bank of Italy internal users (research, supervision, markets) External users (reporting agents, researchers …) Context and requirements The scenario Informationsystem Reporting Agents (MFI, enterprise, etc.) Internal sources (Payment system, Accounting system,…) Other institutions (IMF, OECD, ECB,Eurostat,..) Design and build Collect and validate process disseminate METIS GENERIC PROCESS COMPLIANT Datawarehouse Metadata Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 3
Context and requirements Main requirements Data types Processes types • Quantitative and qualitative data Common statistical data (time series, arrays,…), questionnaires, scores,… • Registers of entities • Structured data with attached documents Banks' balance sheets, Economic research, supervision, payment systems, Financial intelligence unit,… • Hub-and-spoke collection • Bilateral exchange • Ad hoc surveys processes • Statistics production lines • Register handling (e.g. Central credit register) • Publications and report Support for the organization • Management of independence vs cooperation patterns between information system segments • Support for independent information systems (“Statistical communities) (e.g. Central Bank and Financial Intelligence Unit) • Administration/use rights handling and tailoring Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 4
Context and requirements INFOSTAT: the drivers • ICT opportunities • Architectures • Tools • Methodologies • Open sources • Standards • Evolution of business needs • Statistics must be redesigned at increasing pace • Co-operation impacts on data and processes • Users need to access data anytime and anywhere Statistical Data Management • ESCB directions on IT • Pooling and consolidation • Reference Architecture • Statistical Standards • Exchange formats • Generic statistical process Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 5
Outline: • Context and requirements • Foundations of the statistical information system: the Information Model and the IT architecture • The representation of data transformations • The calculation services Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 6
Foundations : the IM and the IT architecture Solution foundations: the Information Model (IM) and the IT architecture Informationsystem Process steps Design Build Collect Process Disseminate Use warehouse metadata INFORMATION MODEL IT SOLUTION generic vision of statistical data and their relationships (e.g. logical dependencies, processing rules) Holistic approach to the processes and their data with a view to support user requirements using a platform Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 7
Foundations : the IM and the IT architecture Information Model TO DEAL WITH AN END TO END DATA PROCESSING A COMPREHENSIVEIM SHOULD INCLUDE Process steps Design Build Collect Process Disseminate Use DATA STRUCTURES AND CONCEPTS REPRESENTATION (for all process steps) DATA PRODUCTION REPRESENTATION (e.g. metadata to describe the calculation of new aggregated data) “PROCESS METADATA” FOR ALL PROCESS ACTIVITIES (e.g. agreement with providers, remarks management) …and all the available models lack some useful features… Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 8
Foundations : the IM and the IT architecture SDMX & Matrix at a glance Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 9
Foundations : the IM and the IT architecture SDMX - Matrix interoperability transformation data concepts Matrix and SDMX IM are interoperable, i.e. It is possible to export the content of a Matrix Dictionary into an SDMX structure message and vice versa It is possible to export the content of a Matrix datawarehouse into an SDMX data message and vice versa More about Matrix: http://www.bancaditalia.it/statistiche/quadro_norma_metodo/modell_SIS Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 10
Foundations : the IM and the IT architecture IT solution Registers of entities Requirements lead to the definition of different processes which include different sets of the typical activities, possibly with different sequences and iterations, specialized for the specific business case Ad hoc surveys Structured and unstructured information Quantitative and qualitative data Hub and spoke Publications and report Bilateral exchange Example: production of the information for the ECB concerning the balance sheet of the monetary and financial institutions sector Collect Check Process Process Process Disseminate information on securities on a security-by-security basis. Reported security code validity Data are integrated with information relevant to the collected security codes data are aggregated Missing observations are estimated to ECB Collect securities register updates HOW TO ADOPT AN IT UNITARY APPROACH ? Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 11
Foundations : the IM and the IT architecture Some Infostat SOA services in data supply chain Actors NCB Information providers Banks, firms, families, commercial providers NCB NCB Information consumers Public, Banks, Firms, National & International institutions Process steps Design and buikd Reporting Agents activities Collection Data processing for compilation Dissemination Usage Statistical services (for internal and external users) >Data model design >Metadata registry & repository >Metadata life cycle handling > Metadata versioning >Test environment for definitions >Calendar >Upload & download of messages >On-line data-entry >Inquiry on data and remarks >Protocol adapters >Format converters >Data quality assurance >Notification of remarks >Correction processing >Data editing >Environment for data production >Data versioning >Adapters for statistical packages >Protocol adapters >Format converters >Publications >Reports >Notification >Export and download >A2A Data and metadata interface Common services: Data Warehouse, Search, Data & metadata inquiry, Calculations, Monitoring, Collaboration Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 12
Foundations : the IM and the IT architecture INFOSTAT main features Able to manage a large variety of data and processes using a unique IT solution Able to meet new users’ requirements in a very short time Able to customize the behaviour of the information system according to the organizational structure and the responsibilities Able to support the openness to statistical world, from the user perspective (advanced user interface able to search and access all available information and the related documentation) ; Able to support more complex scenarios for data production where import and export data from/to the most common statistical packages (SAS, STATA, MathLab, Excel, etc.) is required Able to support interoperability with most common standards concerning statistics and business reporting (i.e. SDMX and XBRL). ESCB Reference Architecture compliant Largely independent from vendors Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 13
Model driven approach Data Process Functions Multi channel access International standard support U2A Data formats (SDMX, XBRL) Web services EXDI ICT standards (W3C, WS-I) e-Mail Architectural, Data, Security Principles Open source SOA Web 2.0 RIA Portal ESCB IT Architecture compliance Best-of-breed technologies Foundations : the IM and the IT architecture Infostat IT architecture Model driven approach is the main pillar which props up the Platform design. In order to meet flexibility requirements, software must not refer to specific data or processes since they change quickly and perhaps deeply. A model driven approach allows to make software not dependent from specific data or processes. INFOSTAT Platform supports the most common international standards. They make interoperability easier and avoid dependency from specific vendors. A multi channel solution has been adopted in order to support the most common interfaces. Other connectors can be added in the future. Leading edge technologies are used to achieve usability, agility and efficiency. In order to enable sharing of services. INFOSTAT Platform is compliant with the principles defined in the ESCB IT Architecture. Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 14
Foundations : the IM and the IT architecture Infostat whole scenario Process steps Other use (A2A,etc. Design Build Collect Process Disseminate Use Infostat SOA services Software components: Open Source, Custom software, Statistical packages Logical Unique metadata repository (Dictionary) Logical unique warehouse End User computing Work data Ready to use data Connectors to phisical sources Physical storage environments: RDBMS, file system, packages (SAS, Fame) Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 15
Outline: • Context and requirements • Foundations of the statistical information system: the Information Model and the IT architecture • The representation of data transformations • The calculation services Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 16
The representation of data transformation Methodological approach to represent data transformations EXTERNAL VIEW The transformation is a black box, the information is the relationship between input data and output data Operand F1 TRANSFORMATION Result F3 Operand F2 INTERNAL VIEW specify the algorithm that transforms the input into the output data F1 / (F1 + F2) * 100 - F2 MATRIX comprises a proprietary and extendible language called EXL to define expressions. EXL syntax is formally defined using the Backus-Naur Form notation. The appearance of EXL expressions is very similar to the formulas of a spreadsheet. Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 17
The representation of data transformation EXL exploits the IM features and adopts a user oriented lexicon The operands and the results of the EXL operators (and expressions) are Matrix objects, and in particular statistical data (cubes). Other model items can be used in EXL expressions: elements, sets, hierarchies, events (mergers, split), etc. The vocabulary adopted for the operators is taken from the business of information management In the same expression and in the same expression chain different types of data (time series, cross sectional, qualitative data defined within registers and so on) can be processed provided that they are defined in the dictionary and that the IT connectors to get them from the warehouse are available Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience Prague -....conference ...2009 - Processing and... 18
The representation of data transformation Expression language and data exchange formats The expression language can be used to define consistency checks. The results of the consistency checks are data containing information about the quality of other data Step1a = get([DataD1],keep(DATE,AMOUNT),sum(AMOUNT)) Step1b = get([DataD2],keep(DATE,AMOUNT),sum(AMOUNT)) DataQ1 = check(Step1a – Step1b) An expression language can be useful to make an exchange format able to exchange all definitions of semantic constraints on data using a formal and IT platform independent format (e.g. XBRL Formula initiative) Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 19
Outline: • Context and requirements • Foundations of the statistical information system: the Information Model and the IT architecture • The representation of data transformations • The calculation services Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 20
The calculation services The calculation services INFOSTAT includes specific services to support both the definition of the algorithms based on EXL (i.e. metadata definition service) and their evaluation (i.e. calculation execution service). DEFINITION EXECUTION Definition parsing production of new statistical data estimation Validationof collected and produced data Definitions editing Integration of input data Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 21
The calculation services Calculation services architecture Metadata repository (Dictionary): includes definition of data, transformations and EXL expressions Logical data structures, physical locations, EXL expressions Infostat SOA data access services Infostat SOA calculation execution services get operand data and update calculated data Translation from EXL expressions into package/language dependent expression (SAS, Fame, SQL,..) strategy for computational efficiency (engine is designed to perform several operations in parallel) Connectors to phisical sources (translation from logical requests to IT specific instructions: SQL, Web Service invocation,..) Warehouse : contains data (collected, calculated, estimated,...) Software components currently available: SYSTEM R (open source) A custom component specialized to perform multidimensional aggregations of very large quantitative of data Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 22
The calculation services Calculus capabilities key points (1) • The algorithms are defined in a unique and platform independent language (EXL), without regard to the specific package that actually performs the calculation during the process. INFOSTAT provides translations from the EXL format to each package specific format • A business documentation for algorithms and for the overall transformation sequences is available in the metadata repository fostering the overall process transparency • Business users can directly define transformation and algorithms • The system allows the integration of custom software, open source software, statistical packages, exploiting the capability of each package in a common framework of calculus services Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 23
The calculation services Calculus capabilities key points (2) • Calculations can be performed by the “better” package • New packages can be added to the architecture in an incremental way • INFOSTAT services for data access can be efficiently used to perform calculations that involve data physically distributed in different locations and also with different logical structures (RDBMS, SDMX data message files, etc.) • INFOSTAT calculus engine is designed to perform several operations in parallel Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 24
Statistics: Investment in the future 2 Prague 15 September 2009 Thank you Fabio Di Giovanni Banca d’Italia fabio.digiovanni@bancaditalia.it Daniele Piazza Banca d’Italia daniele.piazza@bancaditalia.it Statistics: Investment in the future 2 – Prague – 15 September 2009 - Processing and managing statistical data: a National Central Bank experience 25