1 / 31

Do You Need an ETL Tool?

Explore ETL (Extract, Transform, Load) tools vs. handcrafted code techniques for data quality, with insights, examples, advantages, and PL/SQL tips.

jonese
Download Presentation

Do You Need an ETL Tool?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Do You Need an ETL Tool? Ben Bor NZ Ministry of Health

  2. Ben Bor • Over 20 years in IT, most of it in Information Management • Oracle specialist since version 5 • Involved in Business Intelligence for over 10 years • Consulted the world’s largest corporations • Presents regularly on Information Management • Was annual Guest Lecturer at Sussex University

  3. Contents • What is ETL • ETL tools vs. ‘handcraft’ code • PL/SQL techniques

  4. What is ETL ETL = Extract, Transform and Load: • Any source, target ; • Built-in complex transformations • Point-to-point vs. hub-and-spoke

  5. Traditional ETL

  6. Our Own ETL Requirements Data Quality Flat Files SQL Loader PL/SQL PL/SQL

  7. Travel Company Example

  8. Tools or Handcraft? ETL Advantages: Graphic User Interface Automatic documentation Off-the-shelf set of ready-to-use transformations Built-in scheduler Database Agnostic Handcrafting Advantages: No limitation reuse existing code & non ETL No specific methodology No license cost No impact on infrastructure Transportable Release & Code-Management by script

  9. Oracle ETL Facilities • External Tables • Merge • SQL Loader • PL/SQL • Database links

  10. Why Use PL/SQL • Integrated environment (no installation required) • Available resources • Reuse code ‘snippets’ • Good performance • Integration with and control of the database

  11. PL/SQL Tips and Techniques Quality Techniques Tricks

  12. Quality

  13. What is Quality? [1] “Totality of characteristics of an entity that bears on its ability to satisfy stated and implied needs.“ [The ISO 8204 definition for quality]

  14. Quality 2 [2] Quality is a collection of “ilities”: • Reliability - operate error free • Modifiability - have enhancement changes made easily • Understandability - understand the software readily • Efficiency - the speed of the software • Usability - use the software easily • Testability - construct and execute test cases easily • Portability - transport the software easily

  15. Quality 3 [3] “All the things you do today in your software development, in order to bear fruit in the future.”

  16. Standards & Conventions Use meaningful names V_Number_Of_Items_In_Array vs. i or no_itms Distinguish between types: V_ Variable a_ Parameter C_ Constant G_ Global constant

  17. Using Packages • Central package with utilities and all output • All error messages and numbers • All common constants (date format etc’) • Global variables • Statistics data • Other packages encapsulate related logic • Within package: • Procedures & functions have: • Meaningful name • A99_ prefix. A is the level (A highest). 99 unique ID

  18. Example: procedure and variable naming XXX_Write_Flat_File.U03_Write_Record_To_CSV( a_File_Handle, C_Field_Delim, C_Field_Separ, C_Record_Separ, RM_REFERENCE_rec.REFTYPE, RM_REFERENCE_rec.CODE, RM_REFERENCE_rec.DESCRIPTION, To_Char(RM_REFERENCE_rec.ISDEFAULT ,'9') );

  19. Techniques Error logging Autonomous Transaction Run statistics Release mechanism Overloading

  20. Error Logging Technique Global variables keep key information: • Record ID • Run ID • Location in code Local error trapping decides severity and error code. All error trapping passed up.

  21. Error Logging Structure TABLE ERROR_LOG( ERR_TIME DATE, ERR_NUM INTEGER, SOURCE_URN VARCHAR2(20), SOURCE_SYSTEM_ID VARCHAR2(5), PLACE_IN_CODE VARCHAR2(64), ERR_LOCATION VARCHAR2(255), ERR_DESCRIPTION VARCHAR2(512), SEVERITY NUMBER(6) ) ERR_TIME 18-OCT-02 10:04:52 ERR_NUM 1001 SOURCE_URN 223010913 SOURCE_SYSTEM CRS PLACE_IN_CODE In FLIP_PKG B06 ; 6(utils A08) ERR_LOCATION A08_Lookup_Type ERR_DESCRIPTION No match found for [Plan_Code] value [C3] SEVERITY 10

  22. Autonomous Transaction -- =================== PROCEDURE E00_write_error_log( -- =================== a_err_num INinteger, a_Severity INInteger, a_err_location INVarChar, a_err_description INVarChar) IS PRAGMA AUTONOMOUS_TRANSACTION; V_Place_In_Code DW_Process.Error_Log.Place_In_Code%Type; BEGIN V_Place_In_Code := G_Place_In_Code || '(utils ' || G_Place_In_UTILS_Code || ')'; INSERTINTO DW_Process.Error_Log ( err_time, err_num, Severity, BOROUGH_ID, SOURCE_URN, SOURCE_SYSTEM_ID, Place_In_Code, err_location, err_description ) VALUES ( sysdate, a_err_num, a_Severity, G_BOROUGH_ID, G_SOURCE_URN, G_SOURCE_SYSTEM_ID, V_Place_In_Code, a_err_location, a_err_description ); COMMIT;-- commit the autonomous transaction, outside transaction is unaffected. G_Stats_Rec.TOTAL_NO_OF_ERRORS := G_Stats_Rec.TOTAL_NO_OF_ERRORS +1; -- =================== END E00_Write_Error_Log ; -- ===================

  23. Run Statistics • G_Stats_Rec is a record with all the statistics fields • Defined in the central package (therefore resident in memory) • It is updated by the writing procedures (all central) • It is written out at the end of the run

  24. Release Mechanism • Table of ‘release notes’ • Each package has C_Version constant updated each release • ‘Show_Version’ scripts display versions and notes • Results shipped with each release

  25. Remove Spaces -- =================== FUNCTION A04_Remove_Spaces( -- =================== a_Instring INVarchar) ReturnVarchar IS /* ** Removes all the spaces from a string, leaving the rest of the printable characters */ BEGIN G_place_in_UTILS_code :='A04'; -- For use by the error trapping routine RETURN TRANSLATE( a_Instring, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*() ', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ || '\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*()'); -- =================== END A04_Remove_Spaces ; -- ===================

  26. Strip Leading non-numerics -- ============================ FUNCTION F09_Strip_Leading_non_digits( -- ============================ a_String INVARCHAR2) RETURNVARCHAR2 IS /* ** Remove leading non-digits from the input. ** Example: Input string: 'abcde12345edcba' ** Output string: '12345edcba' */ v_string Varchar2(4000); v_first_digit_pos Integer; BEGIN -- Replace all digits by 0 v_string := Translate(a_String,'1234567890','0000000000'); v_first_digit_pos := instr(v_string,'0'); RETURN F01_Right(a_String, v_first_digit_pos ); -- ============================ END F09_Strip_Leading_non_digits; -- ============================

  27. Overloading -- ======================= PROCEDURE U03_Write_Record_To_CSV( -- ======================= a_File_Handle INutl_file.file_type, a_Field_Delim INVarChar,-- the quotes, for CSV a_Field_Separ INVarChar,-- the comma , for CSV a_Record_Separ INVarChar,-- the Carriage Return + Line feed , for CSV a_String1 INVarChar:= G_default_Value , a_String2 INVarChar:= G_default_Value , a_String3 INVarChar:= G_default_Value , . . . ) IS BEGIN IF a_String1 = G_default_Value THENGOTO End_Of_Record ;ENDIF; U02_Write(a_File_Handle, a_Field_Delim || a_String1 || a_Field_Delim); IF a_String2 = G_default_Value THENGOTO End_Of_Record ;ENDIF; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String2 || a_Field_Delim ); IF a_String3 = G_default_Value THENGOTO End_Of_Record ;ENDIF; U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String3 || a_Field_Delim ); . . . <<End_Of_Record>> U01_Write_Line(a_File_Handle, a_Record_Separ); -- ======================= END U03_Write_Record_To_CSV ;--------------------------------------------------------------------------------------------------------------------------------------------------------------- -- =======================

  28. Summary ETL or PL/SQL? Your choice. • Consider: • Overall cost • ‘Politics’ • Convenience • Portability • Speed of development • Reusability IF PL/SQL : ensure Quality

  29. Thank you !

  30. Thank you ! I can be contacted at ben_bor@moh.govt.nz

More Related