240 likes | 297 Views
INFOSTAT The solution of the Bank of Italy for statistical data collection and processing. Paola Maurizi Statistics Collection and Processing Department Application Development Division. Efficient Ways Of Statistical Data Collection From Enterprises Luxembourg, 22-23 March 2012.
E N D
INFOSTAT The solution of the Bank of Italy for statistical data collection and processing Paola Maurizi Statistics Collection and Processing Department Application Development Division Efficient Ways Of Statistical Data Collection From Enterprises Luxembourg, 22-23 March 2012
Reporting units (Banks & OFI (>4.000), enterprises (>15.000), individuals (>150.000)) Other Institutions (IMF, OECD, ECB, BIS Eurostat, ISTAT, …) Market Providers (Bloomberg, IBCA, Enterprise Register, .…) Internal sources (payment system, accounting system …) Institutional statistics BI users (research, supervision, Markets, >2.500 users) Public data (> 750.000 inquiries/year) Return flows (to > 5.000 reporting agents) Other Flows (to other Institutions ) Research & Economic analysis Supervision Payment system C.C.R. F.I.U. Statistics in BoI Monetary policy Official Statistics (> 1 billion observations / year ) 2
Requirements Scenarios Data types : Qualitative (e.g. scores, questionnaires), Quantitative, Unstructured Business view of data: Time Series, Cross sectional, Business registers, … Data exchange scenarios: Hub & Spoke, Bilateral, Push or Pull ... Statistical subjects: Money and banking, Payment systems,.. A different scenario is defined for each combination of characteristics The challenge is to meet the requirements adopting a unitary approach 3
Solution foundations: the MATRIX Information Model and the INFOSTAT IT platform Informationsystem Process steps Design Build Collect Process Disseminate Use warehouse metadata INFORMATION MODEL IT PLATFORM generic vision of statistical data and their relationships (logical dependencies, processing rules) Holistic approach to the processes and their data with a view to support user requirements using a platform 4
Foundations of the statistical information system MATRIX INFORMATION MODEL 5
The Matrix model foundation: • …designed to foster integration– a unique model for all kinds of statistical data, other statistical artifacts, business rules parameters • ..a Formal model – Based on Mathematical and Probabilistic theories and able to give an end to end support to a statistical process • …a Statistical Business Oriented model – it fosters the autonomy of the statistical business people in administering and using the statistical information system • The Matrix Model is able to describe: • Multidimensional data (including olap cubes, time series, cross sectional data) • Business registers (e.g RIAD, Securities, Institutional Units, Central Credit Register, etc.) • Questionnaires (e.g. Balance of Payments: Direct Reporting, extemporary surveys, etc) 6
Data identification & description Constraints & calculation rules (EXL) Dissemination rules Data processing management rules Rendering rules Provisioning agreements Data relationships Warehouse management rules (lineage, versioning,...) Matrix– Building blocks to fully support an end to end data processing system a comprehensive Information Model should include Process steps Design Build Collect Process Disseminate Use 7
Orig Currency R Unit of measure R S E A Inst i tut. Uni t R E S T Scale E C I C E S S O M U D T E A A I U R U Y C M Time T D N Asset Item A I P T O U E T T L E O U R N R Y M R N I C Y A T T Y T T Y Y U R P I E T Y IMFI thousands 001 € M DEPOSITS VIS-A-VIS CREDIT INSTITUTIONS 3 1 1 E P 001 IMFI € thousands C1 SECURITIES M 3 2 2 E R P 001 IMFI € thousands C5 FACTORING LOANS Q 3 2 1 W T P Legal values (codelist) for Institutional Unit: IMFI: Italian MFIs Legal values for Amount P: positive numbers Legal values (codelist) for Frequency M: Monthly Q: Quarterly Legal values (codelist) for Country E: European Countries (… Italy, Ireland …) W: European & Asian Countries (… India, Italy …) The Matrix schema Combination constraints Dimensions Measures Attributes Id Measure Dimension Reusable concepts Sets, Variables, Elements (single values) do not depend on data; they can appear in many different data structures: they are REUSABLE and SHARABLE 8
Example: sets of legal values and historicity awareness Full elements list for “Country”: Multidimensional cube The Matrix schema is adopted even by other (European and non European) Central Banks and by the European Banking Authority as documentation for COREP and FINREP official taxonomies (http://www.eurofiling.info) 9
MATRIX support events Two codes in a codelist at a certain time become “equivalent” to a new code (merging into a new item) West Germany Germany East Germany Time axis 1989 EVENTS TYPES: One to many Many to Many New codes or merging into existing ones …. Event time 10
Operand: C Operand: 2 Operand: M EXL Transformations: User perspective Einstein equation E = MC2 E = M*(C**2) Expression: E = M*(C**2) Result: E
EXL main characteristics Formal - the grammar is defined in a Backus-Naur Form notation Implementation independent – EXL is not an IT language (SQL, XPATH,..) Based on Matrix (operands & results are Matrix Cubes) Check example EXPRESSIONS: C3 = get ( C1, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT)) C4 = get ( C2, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT)) C5 = check ( C3 – C4 <= given_threshold ) 12
Foundations of the statistical information system INFOSTAT PLATFORM 13
Service Oriented Architecture • Adaptability to changes • Support to business processes through the integration & orchestration of software services • Services loosely coupled in order to be used by different business processes • Software cooperation/integration • Well defined, standard interfaces • Standard protocols for inter-services communication (HTTP) BUSINESS USERS CAN BUILD A NEW SURVEY AND DRIVE THE PLATFORM BEHAVIOURS ONLY BY CHANGING METADATA BUT….. FOR MORE COMPLEX REQUIREMENTS ….. A NEW PROCESS CAN BE ASSEMBLED USING THE EXISTING SERVICES 14
Information Information provider consumer User interface A2A User interface A2A Messages upload, Notifications, Data entry Inquiry, search Data services remarks download Alerts Collection Collaboration Dissemination & Format validation conversion Workflow engine Checks Data Documents Warehouse Dictionary Events Data definition Analysis & reporting Monitor Process Metadata Metadata Regular Inquiry, search, Data Report import/export services administration data production analysis tools generation monitor Data Metadata Operations analyst administrator administrator 8:34-8:36 INFOSTAT architecture Calculation engine
Matrix E/R Dictionary Infostat interoperability Versatility to other exchange formats DATA MATRIX CSV Otherformats / standards Infostat is interoperable with different formats It is possible to export the content of a Matrix Dictionary to other formats (e.g. an SDMX structure message or an XBRL Taxonomy and vice versa) It is possible to export the content of a Matrix data warehouse to other formats (e.g. an SDMX data message or an XBRL document instance and vice versa) 16
Infostat interoperabilityData services for statistical packages Statistical packages used in the Bank of Italy: Excel, Speakeasy, SAS, FAME, Matlab, R, STATA, Eviews IT services • “Get” services (tools plug – in ) • Data and metadata (registry) extraction from the warehouse • “Put” services (tools plug – in) • Data and metadata (registry) upload to the warehouse 17
INFOSTAT Data Collection 18
Services for respondents Overview MESSAGES INQUIRY DATA ENTRY A set of services to assist the respondents in providing data to the Bank of Italy Data entry is an IT application completely driven by metadata. The staging area - hosted in the Bank of Italy systems – is an environment to store data and messages Services for respondents DIAGNOSTIC CHECK DATA STORAGE FACILITIES UPLOAD FACILITIES 19
Services for respondents Staging Area PROCESS Data preparation Diagnostic request Checks Feedback sending Check report analysis Data correction Respondents Respondents DIAGNOSTIC IS NOT MANDATORY 20
Services for respondents Official Data Delivery PROCESS Data preparation Diagnostic request Checks Feedback sending Check report analysis Data correction Data Warehouse Respondents Respondents Banca d’Italia 21
Services for respondents Ad-hoc surveys • A flexible solution to build data entry modules for “ad-hoc surveys”. Extemporary surveys for statistical purposes based on ADOBE technology fully integrated into the Infostat Platform • Adobe form workflow supports: • Data entry support • On-line checks • Storage data • Delivery data 22
Services for respondents Document collection • Non-structured contents can be collected using Infostat services • The document collectionworkflow supports: • Documents collection (periodical or extemporary collection) • Documents attached to data messages • Storage documents • Cryptography and compression management 23
Thank you! Any Questions? Paola Maurizi Statistics Collection and Processing Department Application Development Division Efficient Ways Of Statistical Data Collection From Enterprises Luxembourg, 22-23 March 2012