1 / 26

Optimizing ETL for Oracle

Optimizing ETL for Oracle. Maria Colgan Data Warehouse Product Management. Agenda. The Three Layers of ETL Think about Hardware Think about your Access Methods Hardware Trends and ETL Parallel Loading and Partitioning Q&A. Your ETL Tool of Choice. The Data Warehouse Database.

albert
Download Presentation

Optimizing ETL for Oracle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing ETL for Oracle Maria Colgan Data Warehouse Product Management

  2. Agenda • The Three Layers of ETL • Think about Hardware • Think about your Access Methods • Hardware Trends and ETL • Parallel Loading and Partitioning • Q&A

  3. Your ETL Tool of Choice The Data Warehouse Database Hardware and Storage The Three Layers of ETL The ETL Ecosystem

  4. CPU Quantity and Speed dictate number of HBAs capacity of interconnect HBA Quantity and Speed dictate number of Disk Controllers Speed and quantity of switches HBA1 HBA2 HBA2 HBA1 HBA2 HBA1 HBA2 HBA1 Controllers Quantity and Speed dictate number of Disks Speed and quantity of switches FC-Switch1 FC-Switch2 Disk Quantity and Speed Disk Array 1 Disk Array 2 Disk Array 3 Disk Array 4 Disk Array 5 Disk Array 6 Disk Array 7 Disk Array 8 Balanced Configuration “The weakest link” defines the throughput

  5. Database configuration best practices • Minimum of 4GB of Memory per core (8GB if compression) • Interconnect bandwidth should equal IO bandwidth • Use ASM or RAID 10 (Stripe And Mirror Everything) • Use AutoAllocate tablespaces (NO Uniform extend sizes) • Set Initial and Next to 8MB in the storage clause of large partitioned tables • DB_BLOCK_SIZE = 8KB / 16KB

  6. Most Bang for the Buck • Transporting data around can be expensive • Most performance can be gained when loading into the database • Focus should therefore be on: • Where is data now • How to get into a format that allows the fastest load methods • Ensuring the database and data sets are set up for this load • Spend your time wisely and try to achieve the biggest improvements that can be made • Minimize staging data (writes are expensive)

  7. Flat Files Bulk Access Methods Ideal Access Method Bulk Performance TTS Data Pump Common Methods XML Files JDBC, ODBC,Gateways and DBLinks Web Services Heterogeneous

  8. Bulk Access Strategies • Own your access strategy • Dictate how you want data • Set SLAs for your data access provider • Do not join over database links • No parallel access • Limited functionality when using ODBC • Keep it Simple

  9. SQL Loader or External Tables • And the winner is => External Tables • Why: • Full usage of SQL capabilities directly on the data • Automatic use of parallel capabilities (just like a table) • No need to stage the data once more • Better allocation of space when storing data • Interesting capabilities like • The usage of data pump • The usage of pre-processing

  10. Tips for External Tables • File locations and size • Stage data across as many physical disks as possible • When using multiple files the file size should be similar • Use largest to smallest in LOCATION clause if not similar in size • File Formats • Use a format allowing position-able and seek-able scans • Delimitate clearly and use well known record termination to allow for automatic Granulation • Consider compressing data files and uncompressing during loading

  11. Parallel with Compressed Files • When using compression Oracle will not parallelize the load (not even when you use External Table) • Use Multiple Compressed Files per ET to get parallel loading • To Create these multiple files: • Either create the archives in equal chunks when the data is generated and compressed • Use the pre-processing steps to “dynamically” break down the master archive into smaller files

  12. Tips for External Tables (2) • New functionality in 11.1.0.7 and 10.2.0.5 • For Example • Allows compression of large files • Speeding up transport of files to the load location • Simplify process by encapsulating “unzip” CREATE TABLE sales_external (…) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY data_dir1 ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE PREPROCESSOR exec_dir: 'gunzip’ FIELDS TERMINATED BY '|' ) LOCATION (…) )

  13. Tips for External Tables (3) • Has been there since 10.1.x • For Example • Allows utilization of Data Pump as an ETL extraction method • Great for unloading from Oracle to Oracle CREATE TABLE inventories_xt ORGANIZATION EXTERNAL ( TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY def_dir1 LOCATION ('inv_xt.dmp') ) AS SELECT * FROM inventories where <anything goes>;

  14. Bulk Data Loading – Example SQL Server Oracle Target (10.2 / 11.1) BCP Unload Uncompress Compress External Tables FTP Oracle Source FTP Compress Uncompress Data PumpUnload

  15. Hardware Trends • Commodity hardware platforms • Intel Chips • 64 bit Linux OS • Clustered environments • Increasing CPU counts • Increasing memory sizes available • Larger systems • A lot more data • Compute power you didn’t think you could have

  16. Hardware Trends and ETL • Fact: External table is the best way to load the DW • Fact: If you run Linux, you probably run Oracle RAC • Oops: You cannot run a cluster-parallelized SQL statement on an external table unless the files are on shared storage • Oops: A simple single NFS filer will bottleneck your solution in ETL (not enough throughput)

  17. Introducing DBFS (Database Machine) • DataBase File System • DBFS is a file system interface for storing “unstructured data” in Oracle Database • Built on SecureFiles • It is cluster-coherent and scalable with Real Application Clusters • FUSE • Filesystem in Userspace (http://fuse.sourceforge.net/) • Combining DBFS with FUSE offers mountable filesystems for Linux x64 (e.g. Database Machine!)

  18. Configuring DBFS as a staging areaSteps on the OS • Check Fuse rpm is loaded • Add fusermount to you path • Add your Oracle OS user to the OS group “Fuse” • Edit the file /etc/fuse.conf and add the line User_allow_others • Create a mount point for the file system owned by the oracle OS user e.g. /data

  19. Configuring DBFS as a staging areaSteps on the OS • DBFS should be housed in a separate database • Use DBCA with OLTP template to create it • Redo Logs should be large - 8GB • Create bigfile tablespace large enough to hold the filesystem • Create a user DBFS • Grant create session, create table, create procedure, & dbfs role • Grant quota unlimited on the tablespace to user • Create actual filesystem using the script $ORACLE_HOME/rdbms/admin/dbfs_create_filesystem_advanced.sql • Mount the files system using • $ORACLE_HOME/bin/dbfs_client

  20. Access Strategies – Example SQL Server Oracle Target (11.2) BCP Unload External Tables FTP Oracle Source SCP Data PumpUnload DBFS in its own instance holds files in SecureFiles

  21. Direct Path Load • Data is written directly to the database storage using multiple blocks per I/O request using asynchronous writes • A CTAS command always uses direct path but an IAS needs an APPEND hint Insert /*+ APPEND */ into Sales partition(p2) Select * From ext_tab_for_sales_data; • Ensure you do direct path loads in parallel • Specify parallel degree either with hint or on both tables • Enable parallel DML by issuing alter session command ALTER SESSION ENABLE PARALLEL DML;

  22. Database Settings for Data Loading • Use Parallel loading • Create the table with PARALLEL on it • Hint with PARALLEL • Don’t forget “Alter session enable parallel DML” • Use these settings: • PARALLEL_EXECUTION_MESSAGE_SIZE=16KB • PARALLEL_MIN_SERVERS= 2*Default_DOP • PARALLEL_MAX_SERVERS= Max_conc_queries * Default_DOP • DB_BLOCK_SIZE = 8KB / 16KB

  23. Q & A

  24. COLLABORATE 10 - IOUG ForumApril 18-22, 2010Mandalay Bay Convention Center in Las Vegas, NV • 5,000 attendees; Hands on Labs; Boot Camps; Deep Dives, Hospitality events and over 200+ Exhibits • IOUG is offering an entire conference within a conference called: “Get Analytical with BIWA Training Days” brain powered by the IOUG BIWA SIG Sample sessions include: • Integrating Essbase & OBIEE+ With Your Data Warehouse Strategy • OBIEE Security & BI APPS Integration Case Study: Atheros Communications • Data Visualization Best Practices: Know How to Design and Improve Your BI & EPM Reports, Dashboards and Queries • De-Mystifying Oracle Business Intelligence Applications • An ETL Framework using Oracle Warehouse Builder • Driving Unbelievable ROI through Operational Intelligence with OBIEE • Oracle Data Mining: Hands-on Lab

  25. Register by March 18 to Save onCOLLABORATE 10 – IOUG ForumApril 18-22, 2010 ∙ Las Vegas, NV Register via offer code BIWA2010(step one of the registration process) by March 18 tosave $740 off the onsite rate Registration gets your name entered into a drawing for an iPod Touch! collaborate10.ioug.org

More Related