370 likes | 549 Views
Enhancing ETL Performance. Objectives. After completing this lesson, you should be able to: Describe the performance enhancement methods that you can apply in ETL mapping design Describe configuration of performance parameters in schema design
E N D
Objectives • After completing this lesson, you should be able to: • Describe the performance enhancement methods that you can apply in ETL mapping design • Describe configuration of performance parameters in schema design • Describe the process of gathering statistics in Warehouse Builder
Lesson Agenda • Performance parameters and strategies • Performance parameters in ETL design: • Operating modes • Commit control • Partition exchange loading • Performance parameters in Schema design: • Indexes and partitions • Enabling parallel access • Error Logging and gathering statistics
Devise a Performance Strategy Early • Involve a data architect (if you have one) prior to Warehouse Builder implementation to determine a performance strategy. • A performance strategy influences: • Technology choices • Refresh frequency • Whether to use staging tables, partition exchange loading (PEL), or other technologies • How and when to collect data • How to organize source data Have data architect think about performance first!
Performance Tuning at Various Levels • You can tune performance at various levels of data warehouse implementation: • ETL mapping design • Schema design • Possible performance bottlenecks in: • Hardware • Operating system • Network • Database • Application
ETL Design: Mappings • Extract, transform, and load (ETL) involves the movement and transformation of data from your sources to your targets. • OWB mappings specify which source data objects provide data to which target data objects. • Well designed ETL mapping can make all the difference in the performance of your data warehouse. • Consider the cost of using function calls. • Minimize context switching.
Performance-Related Parameters in ETL Design • When creating ETL logic, you need to keep in mind the run-time performance expectations. • Assess your mappings on the following parameters: • Operating modes: “Set based” or “Row based” • Commit Control • Enable Parallel DML • Enable Partition Exchange Loading for targets
Set based Processes rows as one transaction Fastest way Offers DML error logging for rows with problems Row based Processes data row by row Continues processing despite an error for a row Logs information for each row Pros and Cons of Operating Modes Target Target Source Source Set-based Processing Row-based Processing
Configuring Mappings for Operating Modes In the default “Set based fail over to row based” mode, Warehouse Builder uses the “Set based” operating mode, but switches to the slower “Row based” mode if data errors are encountered.
DML Error Logging To enable DML error logging, specify the name of the error table by using the DML Error table name property. The Error table name property is used for “logical errors,” which include the orphan management errors and the data rule violation errors. The DML error table format and the logical error table format are not the same, so they cannot actually share the same table.
Commit Control Default is set to Automatic.With this setting, OWB commits automatically when it senses the need, based on algorithms in the code. Specifies how often data is committed. Default is set to 1000 rows.
Setting Default Audit Level Use “Default audit level” to indicate the audit level used when executing the package. You can set it to NONE, ERROR DETAILS, STATISTICS, or COMPLETE.
Additional Run-Time Parameters for Mappings B A Warehouse Builder uses the “Bulk size” parameter only when the “Bulk processing code” option is selected and the operating mode is set to “Row based.” Use “Maximum number of errors” to indicate the maximum number of errors allowed when executing the package.
Enable Partition Exchange Loading (PEL) for Targets • Partition Exchange Loading loads new data by exchanging it into a target table as a partition. To enable PEL for a mapping, set the PEL Enabled property for the target to true.
Best Practices Tips: ETL Design Factors That Impact Performance • Custom transformation impact • Inefficient context switch to the PL/SQL engine, which must interpret and execute the function for every row • Loading type impact • Insert/Update and Update/Insert should be used when appropriate. • External tables versus SQL Loader • Can be queried as if they were read-only tables • Enabling parallel access for the external tables can activate parallelism and enhance the performance.
Quiz • In Warehouse Builder, you can set the “Default audit level” to indicate how detailed the audit information generation will be. • True • False
Quiz • From the following, select other performance parameters you should consider for better ETL performance: • Commit Control • Use External tables versus SQL Loader • Minimize context switches • Use Partition Exchange loading
Performance-Related Parameters in Schema Design • Warehouse Builder provides schema design capabilities for: • Index management • Partitions • Enabling parallel access • Configuration of physical implementation properties on objects • Tablespace • Constraints management
Benefits of Using Schema Design Capabilities in Warehouse Builder Design Center • When designing database objects such as tables in Design Center, you need not switch to the SQL*Plus environment. • You need not write SQL commands to define partitions, indexes, constraints, and so on. • While generating code, Warehouse Builder automatically incorporates the schema design settings. • Schema design is integrated with logical design, exposing full editing capabilities within the editors. • Warehouse Builder facilitates better schema design by allowing detailed configuration of physical storage and sizing properties of the database objects per configuration.
Indexing • To create an index, you use the Indexes tab in the Table Editor. You can specify the type, key columns, partitions, values, and local or global scope.
Configuring Properties of Indexes • To configure an index, open the configuration editor for the associated table. The tool-tip text shows each property's description.
Index Performance Considerations: Drop Indexes Before the Load Process • Drop the indexes of the target object before the loading process and re-create the indexes after the load is completed. • This significantly improves performance because indexes will not have to be maintained during the load. • To achieve this, in load mappings: • Use a Pre-mapping process operator that will invoke a procedure to drop the indexes before the loading starts. • Use a Post-mapping process operator that will invoke a procedure to re-create the indexes after the loading is complete.
Index Performance Considerations: Drop Indexes Before the Load Process (Example Mapping)
Constraints Management • Constraints checking may slow load performance. • Enable Constraints: Slower data load because constraints are checked for each row • Exceptions Table Name property: Row IDs of records that do not conform will be logged into this table.
Configuring Constraints Individually • You can also configure constraints individually in the Configuration Properties window of the owning object. Configuring foreign key constraint
Defining Partitions in Warehouse Builder • You can define partitions in the Partitions tab of the Table Editor. Different types of partitions are supported.
Defining Partitions in Warehouse Builder Set the partition tablespace parameters in the configuration properties of the partitioned table.
Parallelism • You can enable parallel access for tables in their configuration properties. • You can set the Parallel Access mode to PARALLEL or NOPARALLEL. The default is PARALLEL. • You can specify the Parallel Degree, which is the number of parallel threads used in the parallel operation.
Enable Parallel DML If you select this option, Warehouse Builder enables Parallel DML at run time. The objects involved in the mapping should be enabled for parallelism to take advantage of this option.
User Setting Tablespace Properties • You can set the tablespace properties at: • User level • Module level • Object level Object Module
Minimum Error Logging • Set the LOGGING MODE property to NOLOGGING when logging is not needed. Set the default audit level property for mappings appropriately.
Gathering Statistics • If you want the modification statistics to be collected on a table, set the Statistics Collection property to MONITORING in the Configuration Properties window. Configuration properties of a table.
Analyze Table Statements Property • Set the “Analyze table statements” property for a mapping. • By default, the “Analyze table statements” property is not enabled.
Gathering Schema Statistics In Design Center, select Tools > Preferences. In the Preferences dialog box, expand OWB, select Environment, and then check “Allow Optimize Repository Warning on Startup” to update the schema statistics every time you log in to Warehouse Builder.
Quiz • When you enable parallel DML for a mapping, you must also enable the source and target objects for parallel access. • True • False
Summary • In this lesson, you should have learned how to: • Describe the performance enhancement methods that you can apply in ETL mapping design • Describe configuration of performance parameters in schema design • Describe the process of gathering statistics in Warehouse Builder
Practice 11-1 Overview: Configuring Performance Parameters for Mappings and Tables • In this practice, you examine the following: • The performance, in terms of elapsed time, of a mapping when executed in two different operating modes (Row based and Set based) • Configuration properties of mappings • Indexes, constraints, and partitions on a table (you also configure these)