230 likes | 326 Views
My experience building a custom ETL system. Problems, solutions and Oracle quirks. or How scary Oracle can look for a Java developer. Agenda. WHY do we need an ETL? HOW it works Experience: task existing solutions? problem or Oracle quirk my solution. WHY do we need an ETL?.
E N D
My experience building a custom ETL system Problems, solutions and Oracle quirks • or • How scary Oracle can look for a Java developer
Agenda • WHY do we need an ETL? • HOW it works • Experience: • task • existing solutions? • problem or Oracle quirk • my solution
WHY do we need an ETL? • OLTP - in most cases calculation is non-trivial: • SQLs grow in size & complexity • increased maintenance effort • poor SQL performance • business values calculation is implemented in most of reports independently - no code reuse: • maintenance effort is multiplied by number of reports • copy-paste-driven development
WHY do we need an ETL? OLTP-like RC after RADIO 13lines 3 tables 1 level Balance / UPL calculation: • 46 lines • 7 joins • 3 levels
RADIO: Data flow Routines: • FLAG – extract rows/entities for each event in transaction, sort • MAIN – given event rows, run actions • SNAP – when we have all TX from OLTP snap – calculate appropriate DN snap
Radio: Data flow CLDao CLOG tables FLAG job TX info Dao DNDao DN tables Action MAIN job
PLSQL code generator >600kB of pl/sql code: • TX element row create type as object • resulting DN row create type as object • action for each event type create type as object Code maintenance is pain use higher level language !! JPLSQL = java+pl/sql: jsp-like parser for producing pl/sql. • XML-based DB structure • XML-based flag/action mapping • power of Java
Streams, Triggers & CLOGs • after trigger • my equals • duplicate scn
After trigger • we keep TX apply state in package variables • before trigger is invoked • SUDDENLY!, transaction is rolled back – package variables stay altered! Use only after triggers
My equals We need to filter changes, that happened in columns we don’t collect. But what we do with Oracle’s null ? nvl(:new.val = :old.val, :new.val isnulland :old.val isnull) Simple inline in JPLSQL.
Streams duplicate SCN Ingredients: • several sessions • several tables • Streams replication Apply process can produce message with same SCN. Oracle BUG ID: ???
Processing • Job control • FLAG MAIN communication • MAIN
Processing: JOB control • identification – how do we know a job is running ? • communication – how do we communicate a job ? • dbms_alert has implicit commits • dbms_pipe is not compatible with RAC • sleep – conditioned wait
Processing: FLAG • 250 lines of SQL • 300 lines of explain plan • 1 kTX p/second
Processing: FLAG FLAG computes: • table of • table of • table of • number • event types • event occasions • table index for this event • TX row id FLAG job TX info MAIN job • 3-dimensional table problems: • ordering (no order by in collect statement in 10g) • storing – nested table doesn’t preserve ordering
Processing: MAIN What is different from Java • object has default constructors – very useful for bulk creation • encapsulation is bad - package method access is slower, than variable access • reading from package variable is much, much faster, than reading from tables cache everything
Processing: MAIN What is very different from Java • object/record assignment works by value, not by reference
Processing: MAIN Java-like toString: • get all object fields using user_source view • execute immediate … • very useful for debugging
Processing: MAIN Tom Kyte’s “when others” rule exception: • we really want to catch all kind of errors: • infrastructure logic • business logic constraints • Oracle internal errors • we really want to stop after any error
Post-processing: Deployer Relieves system engineers from deployment paint • Read installation bundle • Read DB objects • Compute difference • Build patch Each object type has it’s own: • create / change statement syntax • system view structure
Post-processing: Deployer Oracle has object dependencies: • pl/sql depends on tables • tables depend on user types • user types depend on their parent types
Misc • SNAP – Trade/RC scn mapping • datatest – xmlforest, emails • very slow dbms_output retrieval