620 likes | 763 Views
Agenda. ODI Performance ODI Scheduling ODI Deployment /Release. Uli Bethke. Dublin based Blog www.bi-q.ie ODI 2007 Reviewer two ODI books ODI articles OTN Deputy chair OUG BI SIG. Next event 11 th June ODI advanced trainer. ODI performance.
E N D
Agenda • ODI Performance • ODI Scheduling • ODI Deployment/Release
Uli Bethke • Dublin based • Blog www.bi-q.ie • ODI 2007 • Reviewer two ODI books • ODI articles OTN • Deputy chair OUG BI SIG. Next event 11thJune • ODI advanced trainer
ODI performance ODI is a metadata driven (SQL) code generator using code templates (knowledge modules). It uses a Java agent to communicate and send data between source and target systems and the repository over the network.
SQL • > 80%: ODI performance issues = SQL issues => SQL main ODI skill • Perfect your SQL. Advanced SQL. Analytic Functions • Know your database(s) inside out. In particular the target • Understand, write, and modify Knowledge Modules
Agent • Light weight Java based application • Tied to host OS • Generates code based on ODI metadata. • Communicates source, target, repository. • JDBC data transport • XML • Jetty • Interpreters: Jython, JBS, JavaScript, Groovy • HSQLDB in memory database • Scheduler • Sizing
Agent Target • Least amount of roundtrips. Network (JDBC, XML) • One target database server only (DW) Another Server • ODBC drivers • JEE agent on Weblogic • No support for target OS • Resources on target • DBA
interfaces • No!! KM using row by row processing • Use ODI functions rather than DB functions • Don’t overuse CKM (especially for large data volumes) • temp indexes (I$) • Gather statistics (C$, I$, TGT when applicable) • Rule of thumb: Use loader KMs or db link KMs rather than JDBC KMs
Source/target • Schemas on same database server. Physical schema and not data server. • Have sources physically close to target • Minimize impact on source • Chunking
CRITICAL PATH Network Paths: Path Durations: B > E > H 6 + 2 + 11 = 19 B > D > F 6 + 4 + 14 = 24 B > D > G 6 + 4 + 10 = 20A > C > G 9 + 8 + 10 = 27 Critical Path
Micro Tuning • JDBC drivers • JVM • Type 4 or 5 JDBC drivers (Data Direct) • Array fetch size. • DB packet size. • Network packet size.
Performance Monitoring • ODI Log Data Mart • Facts • Dimensions • Metrics • Frontend
Dbms_sqltune_util0 • dbms_sqltune_util0.sqltext_to_sqlid • Link to Data Dictionary Tables
maciEJKOCON • Dublin based • ODI 2005 (Sunopsis) • Reviewer two ODI books • Blog www.bi-q.ie • maciek@bi-q.ie
ORCHESTRATING DWHPROCESSES • Orchestration of Data Process Flow • Standard DWH Process flow orchestration • Packages in Oracle Data Integrator 10g • Load Plans in Oracle Data Integrator 11g • Process Flow use cases - efficiency analysis • Alternative scheduling • benefits
TYPICAL DATA FLOW in DWH 1 step STAGE E-LT DATA EXTRACT loads data from sources
TYPICAL DATA FLOW in DWH 1 2 step step DIMs STAGE E-LT LABEL provides structured labeling information DATA EXTRACT loads data from sources
TYPICAL DATA FLOW in DWH 1 2 3 step step step FACTS DIMs STAGE E-LT FACTS consists of measurements, metrics or facts LABEL provides structured labeling information DATA EXTRACT loads data from sources
TYPICAL DATA FLOW in DWH 1 2 3 step step step FACTS DIMs STAGE E-LT FACTS consists of measurements, metrics or facts LABEL provides structured labeling information DATA EXTRACT loads data from sources data transport & transform units
TYPICAL DATA FLOW in DWH 1 2 3 step step step FACTS DIMs STAGE E-LT FACTS consists of measurements, metrics or facts LABEL provides structured labeling information DATA EXTRACT loads data from sources data transport & transform units ODI 11 Load Plans ODI 10g Packages orchestration
ORCHESTRATION – ODIPACKAGES using object directly PRC_B PKG_ABC PKG_DE INT_A INT_D INT_C INT_E
ORCHESTRATION – ODIPACKAGES using object directly using scenarios – compiled code PRC_B PRC_B SYNCHRONOUS PKG_ABC PKG_DE PKG_ABCDE PKG_DE INT_A INT_C INT_A INT_D INT_C INT_E
ORCHESTRATION – ODIPACKAGES using object directly using scenarios – compiled code PRC_B PRC_B PRC_B SYNCHRONOUS PKG_ABCDE PKG_ABC PKG_DE PKG_ABCDE PKG_DE PKG_DE INT_A INT_A INT_C INT_C INT_A INT_D INT_E INT_C ASYNCHRONOUS
ODI 10g vs. ODI 11 FACTS DIMs STAGE PRC_D PRC_G PRC_B PKG_DM A C D F E G B PKG_FG PKG_DE PKG_ABC ODI 10g Packages INT_C INT_F INT_A INT_C
ODI 10g vs. ODI 11 FACTS DIMs STAGE PRC_G PRC_D PRC_B PKG_DM PKG_FG PKG_DE PKG_ABC ODI 10g Packages INT_A INT_C INT_F INT_C ODI 11 Load plans
ODI 10g vs. ODI 11 FACTS DIMs STAGE PRC_B PRC_G PRC_D PKG_DM C A D F B G E PKG_FG PKG_DE PKG_ABC ODI 10g Packages INT_F INT_C INT_A INT_C ODI 11 Load plans same effect!
PROCESS FLOW EFFICIENCY ANALYSIS Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts 30 A B G C F D E sequential 10 30 10 10 10 10 parallel 10 30 10 10 30 + 30 + 10 = 70 G A B F D E C 10 30 10
PROCESS FLOW EFFICIENCY ANALYSIS Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts 30 A B G C F D E sequential 10 10 10 10 10 30 parallel 10 30 10 10 30 + 30 + 10 = 70 B D E C G A F 10 30 10 • DOWNSIDES: • POSSIBLE INEFFICIENCIES (IDLE RESOURCES)
PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C D F E OPTIMIZATION ATTEMPT 10 10 10 30 10 10
PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C F D E OPTIMIZATION ATTEMPT sequential 30 10 10 10 10 10 30 + 10 10 + 30 + 10 = 50 parallel 10 30 10 10 G F B C A E D 10 30 10 70 50 = 1.4 times quicker! • UPSIDE: • EFFICIENCY IMPROVED
PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C F D E OPTIMIZATION ATTEMPT sequential 30 10 10 10 10 10 30 + 10 10 + 30 + 10 = 50 parallel 10 30 10 10 G F B C A E D 10 30 10 70 50 = 1.4 times quicker! • UPSIDE: • EFFICIENCY IMPROVED • DOWNSIDES: • TIMINGS KNOWLEDGE REQUIRED • OVERALL DEPENDECY KNOWLEDGE REQURED
PROCESS FLOW EFFICIENCY ANALYSIS 30 A B G C F D E OPTIMIZATION ATTEMPT sequential 30 10 10 10 10 10 parallel 10 30 10 70 10 30 + 30 + 10 = 70 A B C E F G D 10 30 10 70 • DOWNSIDE: • INEFFICIENCY EXISTS BUT CAN’T BE RESOLVED • CONSUMER WAITING & IMPACT
TraditionalScheduling - limitations • Possible inefficiencies (idle resources) • Timings knowledge required • Overall dependecy knowledge requred • Inefficiency exists but can’t be resolved • Consumer waiting & impact
TraditionalScheduling - limitations • Possible inefficiencies (idle resources) • Timings knowledge required • Overall dependecy knowledge required • Inefficiency exists but can’t be resolved • Consumer waiting & impact SCHEDULER
DEPENDENCY DRIVENScheduling C E A E D C B C E D D A B B A C E E B A C D D A C D E B A C B B D A E
DEPENDENCY DRIVENScheduling C A B E D C B D E E C A B A D C D B A C D E E B A D E B A C C B PACKGAGES&LOAD PLANS D A E
PROCESS FLOW EFFICIENCY ANALYSIS sequential 30 30 A A B B G G C C D F F D E E 30 10 10 10 30 10 10 10 10 10 10 10 parallel 10 30 10 10 30 + 30 + 10 = 70 10 30 10 70 D C B A F G E 70 10 10 10 10 10 30 30
PROCESS FLOW EFFICIENCY ANALYSIS sequential 30 30 A A B B G G C C F D D F E E 30 10 10 30 10 10 10 10 10 10 10 10 parallel 10 30 10 10 30 + 30 + 10 = 70 10 30 10 30 70 G D C E F A B 70 70 10 10 10 10 10 30 30 70 30 = 2.3 times faster!
Dependency Driven Scheduling • Simplifies orchestrating the flow • only immediate upstream definition required • execution timings not relevant • self-adapts in the most effective way • Improves overall E-LT performance • Less idle resources – better utilization • Independency • unveilsits full potential in complex Enterprise class DWHs (Inmon)
Dependency Driven Scheduling • Notifications • errors (+auto-restartability) • finish summary • logging • Multiple/overlapping E-LT streams • load with different frequencies • Parameterization • improved system stress control • process prioritization
FIRST RUN 10 processes
FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes
FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes 132 231 SCENARIOS RUN
FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes 132 231 SCENARIOS RUN 12h43m TIME LOAD PLANS
FIRST RUN TODAY 10 584 1389 DEPENDENCIES processes processes 132 231 SCENARIOS RUN 2.9 12h43m 4h21m TIME TIMES FASTER LOAD PLANS SCHEDULER