550 likes | 724 Views
Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems. Jan Janke Software Engineer CERN / GS-AIS. October 25 - 29, 2010 JINR/CERN Grid and Management Information Systems. Agenda. Data Warehouses in Administrative Computing
E N D
Data Warehouses and Analytical Data Processingin CERN’s Administrative Decision Making Support Systems Jan JankeSoftware EngineerCERN / GS-AIS October 25 - 29, 2010 JINR/CERN Grid and Management Information Systems
Agenda Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Data Warehouses in Administrative Computing • Recap: Data Warehouses Theory • Data Warehouses and Information Systems in AIS • Foundation, HR and FI Information Systems • Complex Data Extraction Processes • Pixel-Perfect Reporting • Dashboards • Detailed Data Warehouse Example • Management Data Layer (MDL)
Agenda Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Data Warehouses in Administrative Computing • Recap: Data Warehouses Theory • Data Warehouses and Information Systems in AIS • Foundation, HR and FI Information Systems • Complex Data Extraction Processes • Pixel-Perfect Reporting • Dashboards • Detailed Data Warehouse Example • Management Data Layer (MDL)
Ca. 16,000 People Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Mankind’s Largest Machine Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Enormous Amount of Data Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Administrative Computing Jan Janke: "Data Warehouses and Analytical Data Processing ..." Provides means to administrate CERN Enables physicists to focus on their work Allows management to make the right moves
Why Data Warehouses? • Support administrative staff • Enforce security and safety on site • Allow management to make decisions Jan Janke: "Data Warehouses and Analytical Data Processing ..." Heterogeneous computing landscape Various specialised OLTP systems Planning needs Legal Requirements
Example: Keep Finances Under Control Specialised small user groups Distinct databases Systems only accessible to authorised specialists High availability and performance, real-time data Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Specialised Systems • Accounting, ERP for CERN stores • External contracts management • Payroll, treasury management, …
Example: Keep Finances Under Control Specialised small user groups Distinct databases Systems only accessible to authorised specialists High availability and performance, real-time data Jan Janke: "Data Warehouses and Analytical Data Processing ..." • General Financial Information System • Single system • Access to data from multiple sources • Different levels of complexity
Example: Keep Finances Under Control Users from all areas of CERN Single data warehouse Security is extremely important! System is accessible CERN wide. High availability and performance, but no necessity for real-time data Jan Janke: "Data Warehouses and Analytical Data Processing ..." • General Financial Information System • Single system • Access to data from multiple sources • Different levels of complexity
AIS’ Financial Data Warehouse • Technologies: • ORACLE RAC database • Java Enterprise web applications • In-house developed frameworks • Third-party BI and reporting tools Jan Janke: "Data Warehouses and Analytical Data Processing ..." Keep data in sync with data providers Master complex data extraction process Ensure high query performance Base for detailed data analysis
Agenda Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Data Warehouses in Administrative Computing • Recap: Data Warehouses Theory • Data Warehouses and Information Systems in AIS • Foundation, HR and FI Information Systems • Complex Data Extraction Processes • Pixel-Perfect Reporting • Dashboards • Detailed Data Warehouse Example • Management Data Layer (MDL)
Find the Needle in the Hay … Jan Janke: "Data Warehouses and Analytical Data Processing ..."
OLTP vs OLAP Jan Janke: "Data Warehouses and Analytical Data Processing ..."
OLTP vs OLAP That’s theory!Real world is not that easy… Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Normalisation (Codd/Boyce) Not in 3NF, why ? Jan Janke: "Data Warehouses and Analytical Data Processing ..." • 1NF • 1 table = 1 relation, no repeating groups or duplicate rows • 2NF • All non prime attributes depend on all parts (attributes) of a composite key • 3NF • All non prime attributes depend only on the (whole) key
Star Schema item time item_key item_name brand type supplier_type time_key dayday_of_the_week month quarter year Sales Fact Table time_key item_key branch_key location location_key Branch location_key Street city state_or_province country units_sold branch_key branch_name branch_type dollars_sold avg_sales Measures Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010) Jan Janke: "Data Warehouses and Analytical Data Processing ..."
SnowflakeSchema item time supplier item_key item_name brand type supplier_key time_key dayday_of_the_week month quarter year Sales Fact Table supplier_key Supplier_type time_key item_key branch_key location location_key Branch location_key street city_key units_sold branch_key branch_name branch_type dollars_sold city avg_sales city_key city state_or_province country Measures Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010) Jan Janke: "Data Warehouses and Analytical Data Processing ..."
From Operations to Reporting FI ERP HR … Source: http://www.deakin.edu.au/ddw/what-is.php (16/10/2010) Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Analysis Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Data Mining • Drilldown • Finer detail granularity (e.g. add a group-by column) • Slice & dice • Play with the dimensions • Combine different dimensions • Remove/add a dimension • Analyse fact changes
Agenda Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Data Warehouses in Administrative Computing • Recap: Data Warehouses Theory • Data Warehouses and Information Systems in AIS • Foundation, HR and FI Information Systems • Complex Data Extraction Processes • Pixel-Perfect Reporting • Dashboards • Detailed Data Warehouse Example • Management Data Layer (MDL)
CERN/AIS Business Map Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Foundation Operative systems HR Information System (HRT) FI Information System (CET) … more domain specific information systems Jan Janke: "Data Warehouses and Analytical Data Processing ..." Common data layer for various AIS services Data interfaces for other CERN services Common applications (e.g. mgmt. of roles)
Various Specialised Systems Jan Janke: "Data Warehouses and Analytical Data Processing ..." ORACLE HR CERN Training Application Safety & access systems EDH (Electronic Document Handling) Accounting Application ERP system for CERN stores Contract follow-up …
Technical Environment Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Source databases: • ORACLE 10g • Microsoft Excel • HR/FI Information Systems: • ORACLE 10g • Java Enterprise web applications • SAP Business Objects tool family
Data Extractions Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Nightly scheduled batch jobs • Extractions organised in SQL scripts • Run by self-developed “batch runner” • Controls • Order of execution (sequential, parallel) • Criticality • Logging • Problem escalation (automatic emails)
Definition of Extraction Process (1) General definitions Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Definition of Extraction Process (2) Batches & commands Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Importance of Monitoring New hardware for DEV databases (gain > 1h) Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Turtle or Leopard ? The difference may be subtle … Jan Janke: "Data Warehouses and Analytical Data Processing ..."
ORACLE Materialised Views Source: ORACLE 10g Documentation / Data Warehousing Guide Jan Janke: "Data Warehouses and Analytical Data Processing ..." Pre-aggregated summaries Benefit from query rewrite
Materialised (Summary) Views Jan Janke: "Data Warehouses and Analytical Data Processing ..." Don’t use remote tables if you need query rewrite Create materialized view log on all source tables
Snapshots Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Use snapshots to efficiently access remote tables • Syntax: CREATE SNAPSHOT … AS [Your Query] • Refresh options: • FAST • COMPLETE • FORCE
Pipelined Functions 1 CREATE OR REPLACE TYPE myTableFormat AS OBJECT( col_a NUMBER, col_b DATE, col_c VARCHAR2(25) ) / CREATE OR REPLACE TYPE myTableType AS TABLE OF myTableFormat / Jan Janke: "Data Warehouses and Analytical Data Processing ..." PL/SQL is data source instead of a table May increase performance in environments with heavy PL/SQL use
Pipelined Functions 2 CREATE OR REPLACE FUNCTION myFunc RETURN myTableType PIPELINED IS BEGIN FOR i in 1 .. 5 LOOP PIPE ROW ( myTableFormat( i, SYSDATE+i, 'Row '||i ) ); END LOOP; RETURN; END; END; / Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Pipelined Functions 3 SELECT * FROM TABLE( myFunc() );col_a col_b col_c --------- ---------- ---------- 1 27/10/2010 Row 1 2 28/10/2010 Row 2 3 29/10/2010 Row 3 4 30/10/2010 Row 4 5 31/10/2010 Row 5 Use a pipelined function if you require a data source other than a table! Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Database Design Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Star schema like • Highly de-normalised incl. duplication of data • Use single-attribute keys wherever possible • Performance matters! • Be careful when extracting over database links • Certain tables from operational systems are copied • Deletion & recreation of indexes • Use partitions • Manual control of statistics collection • Optimizing execution plans very time-consuming
Reporting Application Framework Jan Janke: "Data Warehouses and Analytical Data Processing ..." Column and ordering selection Sub reports Various output formats (e.g. HTML, PDF) Charts Self-service reporting Automated scheduled report execution Row and column based access control
Data access Which data (columns) am I allowed to see? As a supervisor I may not be entitled to see the health insurance category. A safety or medical officer may not see the salary, etc. Which rows are visible to me? Unit leader of B only sees persons from Unit B. Jan Janke: "Data Warehouses and Analytical Data Processing ..."
User Interface Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Pixel Perfect Forms Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Use of Apache FOP library • Examples: • Employment & training attestations • Swiss / French card application forms • Business Objects XI Enterprise • Direct use • Indirect use via Business Objects Java SDK • Examples: • Salary slips • Car stickers • Work orders
Business Objects Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Commercial tool family from SAP • Advantages • Rich reporting possibilities (interactive or via SDK) • Appealing dashboards using Xcelsius • Only a few users need the knowledge to design reports • Drawbacks • Two-way data storage (file system & database) • Sometimes stability problems • Time-intensive administration and maintenance • Expensive
Management Dashboards Designed locally using MS Office and Xcelsius. Data comes from the MDL data warehouse. Published as Flash to the BO Server. Jan Janke: "Data Warehouses and Analytical Data Processing ..."
Agenda Jan Janke: "Data Warehouses and Analytical Data Processing ..." • Data Warehouses in Administrative Computing • Recap: Data Warehouses Theory • Data Warehouses and Information Systems in AIS • Foundation, HR and FI Information Systems • Complex Data Extraction Processes • Pixel-Perfect Reporting • Dashboards • Detailed Data Warehouse Example • Management Data Layer (MDL)
Management Data Layer (MDL) Performance: Currently ca. 170 GB data in two tables Generality: Different forms of data sources, new sources are added and removed all the time. Integration with existing tools and development frameworks (ORACLE, Excel, BO, …) Jan Janke: "Data Warehouses and Analytical Data Processing ..." KPI data warehouse Very extensible Fixed generic schema Feeds management dashboards
MDL Data Model n MDL_HEADERS MDL_DIMENSIONS n MDL_VALUES describes n n MDL_RAW_DATA n n MDL_SUMMARY_DATA MDL_LOOKUP_DATA describes MDL_LOOKUP_INFO Jan Janke: "Data Warehouses and Analytical Data Processing ..."
MDL Data Model n MDL_HEADERS MDL_DIMENSIONS n MDL_VALUES describes n n MDL_RAW_DATA n n MDL_SUMMARY_DATA MDL_LOOKUP_DATA describes MDL_LOOKUP_INFO Jan Janke: "Data Warehouses and Analytical Data Processing ..."
MDL Data Model n MDL_HEADERS MDL_DIMENSIONS n MDL_VALUES describes n n MDL_RAW_DATA n n MDL_SUMMARY_DATA MDL_LOOKUP_DATA describes MDL_LOOKUP_INFO Jan Janke: "Data Warehouses and Analytical Data Processing ..."
MDL Data Model n MDL_HEADERS MDL_DIMENSIONS n MDL_VALUES describes n n MDL_RAW_DATA n n MDL_SUMMARY_DATA MDL_LOOKUP_DATA describes MDL_LOOKUP_INFO Jan Janke: "Data Warehouses and Analytical Data Processing ..."