1 / 15

Metadata-driven Application for Data Processing: From Local Towards Global Solution

This presentation discusses the development of a metadata-driven application for data processing, highlighting the transition from local to global solutions in the statistical process. It explores the use of building blocks, process metadata, and the need for a global solution. Furthermore, it addresses the changes in the implementation of data processing and the impact on IT specialists and subject-matter statisticians.

mzander
Download Presentation

Metadata-driven Application for Data Processing: From Local Towards Global Solution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia

  2. Summary of presentation • Introduction • Current genericapplication – main characteristics • Development of global solution • Changes in the statistical process • Conclusions

  3. Introduction • Statistical data processing: • Demanding, time consuming and very expensive task • Constant pressure for budget cuts • Rationalisation of the statistical process: • Take advantage of the rapid IT development • Movement from domain oriented to process oriented production • Stove-pipe IT solutions replaced by general applications • Statistical Office of the Republic of Slovenia (SURS) • SURS began systematic development of generic solutions 6 years ago • Prototype solutions for several parts of the process were developed • These solutions were already used for several large surveys (e.g. 2010 Agriculture Census and the 2011 Population Census) • The prototype generic solutions are now upgraded to a more global solutions

  4. Generalised solutions – main characteristics • Small, generic solutions for small parts of the statistical process, called the building blocks: • Enable easy and flexible linking of inputs and outputs of the individual components to the whole statistical process • Can be plugged to different databases in different environments (e.g. ORACLE, SAS) if the input database follows few basic conditions • They are designed as fully metadata driven (MDD) systems: one program code → the parameters for the execution of the processing for the concrete survey are provided through the special metadata tables • The process metadata can be provided in different environments (SAS, MS Access, ORACLE) → the metadata organisation must follow the strict rules of its structure (tables and variables)

  5. Different databases of process metadata Building blocks - functioning Different microdata databases Building block Ad-hoc program General SAS program … Ad-hoc program

  6. Microdata Building block 1 Ad-hoc program Transformed data Building block 2 Ad-hoc program Transformed data … Ad-hoc program Building block n Transformed data Linking bulding blocks into the process

  7. Process metadata • The system is to a very large extent based on the process metadata: • Processing rules which enable adjustment of the general program for different surveys. • The process metadata are at the moment inserted directly into MS Access database • High probability of syntax errors • Users must be thoroughly instructed in order to correctly fill the metadata

  8. Building blocks • The basic tool of the whole system are the building blocks, which cover the particular processing phase. • SAS macros which is able to operate on the basis of the process metadata. • So far the building blocks for following phases are created: • Data validation (logical controls) • Deterministic corrections • Data imputations • Standard error estimation • Aggregation • Tabulation • Calculation of quality indicators • Disclosure control (testing phase)

  9. Building a global solution • The developed system is very open and flexible tool. • However certain re-integration would be needed to increase its functionality: • To move the process metadata in ORACLE environment • To create single, unique database of process metadata where process metadata for all the surveys are stored and maintained • To develop the graphical interfaces for user friendly management of process metadata • To link the system with the metadata repository

  10. Application for metadata management The new system Database of processing metadata Metadata repository Different microdata databases Data on tables and variables General SAS program … Ad-hoc program Ad-hoc program

  11. Application for metadata managementDeterministic corrections

  12. Application for metadata managementExecution of the particular process step

  13. New application and statistical process • Generic MDD application introduces changes in the implementation of data processing on general level: • Essentially different distribution of work between IT specialists, general methodologists and IT experts • Change in the role of subject-matter statisticians → changed expectations of their skills and capabilities • The work organisation of the IT Department and the General Methodology Department will have to be changed from domain oriented to process oriented. • Different approach of IT and methodology experts will be needed. • Experts capable of thinking and operating at a much more general level • Survey is just one of the realisations of the general statistical process.

  14. Conclusions • SURS developments in recent years: flexible, metadata driven generic solutions for different phases of data processing. • Very open system will be replaced with more integrated and centralised system • Main goal: Transition from the stove-pipe oriented production to the more integrated processing systems • Two main challenges: • To build the generic IT solutions, which would „cover“ the wide diversity of statistical surveys • To change the very „domain oriented state of mind “ among the employees

  15. Thank you for your attention

More Related