1 / 23

My experience building a custom ETL system

My experience building a custom ETL system. Problems, solutions and Oracle quirks. or How scary Oracle can look for a Java developer. Agenda. WHY do we need an ETL? HOW it works Experience: task existing solutions? problem or Oracle quirk my solution. WHY do we need an ETL?.

Download Presentation

My experience building a custom ETL system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. My experience building a custom ETL system Problems, solutions and Oracle quirks • or • How scary Oracle can look for a Java developer

  2. Agenda • WHY do we need an ETL? • HOW it works • Experience: • task • existing solutions? • problem or Oracle quirk • my solution

  3. WHY do we need an ETL? • OLTP - in most cases calculation is non-trivial: • SQLs grow in size & complexity • increased maintenance effort • poor SQL performance • business values calculation is implemented in most of reports independently - no code reuse: • maintenance effort is multiplied by number of reports • copy-paste-driven development

  4. WHY do we need an ETL? OLTP-like RC after RADIO 13lines 3 tables 1 level Balance / UPL calculation: • 46 lines • 7 joins • 3 levels

  5. RADIO: Data flow Routines: • FLAG – extract rows/entities for each event in transaction, sort • MAIN – given event rows, run actions • SNAP – when we have all TX from OLTP snap – calculate appropriate DN snap

  6. Radio: Data flow CLDao CLOG tables FLAG job TX info Dao DNDao DN tables Action MAIN job

  7. PLSQL code generator >600kB of pl/sql code: • TX element row  create type as object • resulting DN row  create type as object • action for each event type  create type as object Code maintenance is pain  use higher level language !! JPLSQL = java+pl/sql: jsp-like parser for producing pl/sql. • XML-based DB structure • XML-based flag/action mapping • power of Java

  8. Streams, Triggers & CLOGs • after trigger • my equals • duplicate scn

  9. After trigger • we keep TX apply state in package variables • before trigger is invoked • SUDDENLY!, transaction is rolled back – package variables stay altered! Use only after triggers

  10. My equals We need to filter changes, that happened in columns we don’t collect. But what we do with Oracle’s null ? nvl(:new.val = :old.val, :new.val isnulland :old.val isnull) Simple inline in JPLSQL.

  11. Streams duplicate SCN Ingredients: • several sessions • several tables • Streams replication Apply process can produce message with same SCN. Oracle BUG ID: ???

  12. Processing • Job control • FLAG  MAIN communication • MAIN

  13. Processing: JOB control • identification – how do we know a job is running ? • communication – how do we communicate a job ? • dbms_alert has implicit commits • dbms_pipe is not compatible with RAC • sleep – conditioned wait

  14. Processing: FLAG • 250 lines of SQL • 300 lines of explain plan • 1 kTX p/second

  15. Processing: FLAG FLAG computes: • table of • table of • table of • number • event types • event occasions • table index for this event • TX row id FLAG job TX info MAIN job • 3-dimensional table problems: • ordering (no order by in collect statement in 10g) • storing – nested table doesn’t preserve ordering

  16. Processing: MAIN What is different from Java • object has default constructors – very useful for bulk creation • encapsulation is bad - package method access is slower, than variable access • reading from package variable is much, much faster, than reading from tables  cache everything

  17. Processing: MAIN What is very different from Java • object/record assignment works by value, not by reference

  18. Processing: MAIN Java-like toString: • get all object fields using user_source view • execute immediate … • very useful for debugging

  19. Processing: MAIN Tom Kyte’s “when others” rule exception: • we really want to catch all kind of errors: • infrastructure logic • business logic constraints • Oracle internal errors • we really want to stop after any error

  20. Post-processing: Deployer Relieves system engineers from deployment paint • Read installation bundle • Read DB objects • Compute difference • Build patch Each object type has it’s own: • create / change statement syntax • system view structure

  21. Post-processing: Deployer Oracle has object dependencies: • pl/sql depends on tables • tables depend on user types • user types depend on their parent types

  22. Misc • SNAP – Trade/RC scn mapping • datatest – xmlforest, emails • very slow dbms_output retrieval

  23. Questions ?

More Related