620 likes | 633 Views
Learn about the basics of Oracle Optimizer and the difference between Rule Based Optimizer (RBO) and Cost Based Optimizer (CBO). Explore query execution overview and best practices.
E N D
Understanding Oracle Optimizer:RBO versus CBO Tantra Invedy UBSW Energy
List of discussion • Query execution overview • Optimizer basics • Understanding RBO • Understanding CBO • Understanding Hints • Best practices
Query Execution Overview 1. Create a cursor 2. Parse a statement <<< sql optimization 3. Describe query results 4. Define query output 5. Bind variables <<< after sql optimization 6. Parallelize the statement 7. Execute the statement 8. Fetch rows of a query 9. Close the cursor
Step 1: Create Cursor • A cursor is a handle or name for a private SQL area. Unique handle and particular cursor can only be opened by 1 process at a time • It contains information for statement processing • Programs must have an open cursor to process a SQL statement.
Step 2: Parse the Statement Parse statement from user process. It involves: • Translation and verification • Table and column check • Parse lock to prevent structural changing. • Check privileges on referenced objects • Determine optimal execution path • Load statement into shared SQL area • Route distributed statement correctly
Step 2: Parse the statement cont... • Parsing is only necessary if there is no identical SQL in shared area • A new shared SQL area is allocated and statement is parsed • if identical SQL exist then this can be reused.
SGA Buffer Cache PGA Shared Pool Log buffer • Non shareable is stored in PGA • user run time information • Shared pool maintains shareable parts of the cursor: • query text • Execution plan • Bind variable data types and lengths • Shareable cursors saves resources from unnecessary parsing.
Steps 3 and 4: Describe and Define • Describe provides information about the select list items; it is relevant when entering dynamic queries through an OCI application • The define step defines location, size, and data type information required to store fetched values in variables.
Steps 5 and 6: Bind and Parallize • Bind any bind values: • Memory address to store data values • Values do not have to be in place yet • Allows shared SQL even though bind values may change • Parallize the statement • Parse stage has determined if the SQL can be parallized, and parallel plan already built.
Steps 7 through 9 • Execute: • Puts values into all bind variables • Drives the SQL statement to produce the desired results • Fetch rows (for select statement only) • Into defined output variables. • Array fetch mechanism • Close the cursor
To remember …. • Query optimization is in parsing stage, which is before binding variables. • Cost of Non-sharable SQL, out of scope • Next what is query optimization ...
Optimizer Basic • It is an Oracle ‘engine’ that would choose the most efficient method on retrieving data on a given query. • The steps chosen is called execution plan. • Two kind of Optimizer on Oracle: • RBO or Rule Based Optimizer since v6 • CBO or Cost Based Optimizer since v7
QUERY PROCESSING Query OPTIMIZER Query Optimazation RBO / CBO Query rewrite Parse Query Execution QEP Generation Result
Parse phase simple transformation • Convert query to an equivalent but more efficient expression • lastname like (‘JONES’) >>> lastname=‘JONES’ • lastname in (‘AL’,’WATT’) >>> lastname=‘AL’ OR lastname=‘WATT’ • salary between 10 and 100 >>> salary >= 10 AND salary>=100 • NOT(salary<100 and JOB IS NULL) >> salary>=100 and JOB IS NOT NULL • Transform OR into UNION ALL if the derived query is cheaper.
Query Rewrite within Optimizer • View Merging • Subquery Merging • Transitivity (CBO only) • Materialized views
View Merging • View merging rewrites queries containing views so that only base tables remain. • It is only done when a correct results is guaranteed. • Either view is pushed out to query, or query is pushed into the view. • If it cannot be merged, then view must be executed separately. (VIEW and FILTER operator on explain plan)
Non-merge-able Views • Group by clause • All aggregate functions • Rownum reference • Start with / connect by clause • All set operation (union, minus, …) • Distinct (unless query also has distinct) • Has join with tables in original query
Sub-query Merging • Merging sub-query to open up new access path and new join order • Single row subqueries. (Oracle will evaluate subquery and store the result) • ‘IN’ or subqueries might be converted into EXISTS or NOT EXISTS or in line view
Sub-query Merging Example Select … from emp e where e.emp_id = select … from … Select … from emp e where e.emp_id = <evaluated_value>; select … from courses where dev_id in (select emp_id from emp) This would be flattened or converted into a join
Transitivity • CBO performs transitivity as the first step in optimization. • It generates additional predicates based on existing predicates. This would open more access paths. • Transitivity is NOT done for join predicates.
Transitivity Example • A=5, A=B then it is concluded that B=5 • A=B , B=C then oracle would NOT conclude that A=C
Understanding RBO • Released with Oracle 6. • Using an ordered list of access methods and join methods on relative cost or each operation. • Has a very limited input in determining access paths. • Will be removed from the Oracle database Server • Normally, it chooses the path from right to left in the from clause. • If hint (except RULE hint) is supplied, then it will run under CBO. • On some complex queries, it outperforms CBO
RBO ranking 1. Single row by ROWID 2. Single row by cluster join. (cluster) 3. Single row by hash cluster key with unique key. (cluster) 4. Single row by unique index. 5. Cluster join. (cluster) 6. Hash Cluster key. (cluster) 7. Indexed cluster key. (cluster) 8. Composite key. 9. Single-column non-unique index. 10. Bounded range search on indexed columns 11. Unbounded range search on indexed columns 12. Sort-merge join 13. Max or Min of indexed columns 14. Order by on indexed columns 15. Full table-scan.
RBO: Determining Access path • First it would evaluate join predicates and filtering predicates. • Assign scores to each of the predicates, starting from the last to the first. • Pick the lowest score, and evaluate the next. If there were ties, then it would choose arbitrarily based on : • Order based on the FROM clause • Age of an index
Not so good things about RBO... • RBO has a small number of possible access method. (it does not recognize IOT, bitmap index, hash join, …) • It will process the tables based on how they are ordered on the query. (can be good and most of the time is not so good) • Always ranks execution plan based on relative cost in the list, regardless of the data stored in the table. Index scan will always better than table scan, which is not true. • Coding for the RBO is halted. All new features require implementation of CBO. • RBO uses poor information to break frequently occurring ties.
Understanding CBO • It uses all available information. Dictionary , statistics, histogram, supplied parameter setting. • CBO uses Oracle ‘intelligent’ formula to calculate the COST of a SQL statement. Constantly improving from version to version. • It examines all possible access methods (defaulted to 80,000 permutations) • Evaluate the costs for each access plan, then simple choose the lowest one
Available information to CBO • Database statistics (partial list …) • DBA_TABLES (num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len, last_analyzed, sample_size, avg_space_freelist_blocks) • DBA_TAB_COLUMNS or DBA_TAB_COL_STATISTICS (num_distinct, low_value, high_value, density, num_nulls, num_buckets) • DBA_INDEXES (blevel, leaf_blocks, distinct_keys, clustering_factor, num_rows, avg_leaf_block_per_key)
What if there is no statistics ... • If there is no statistics, or bind variables are used, then CBO may use default statistics • Default statistics for Oracle 7.3.3 an above • selectivity for relations on indexed columns .009 • selectivity for = on indexed columns .004 • multiblock read factor 8 • remote table average row length 100 • # of blocks 100 • Scan cost 13 • Index levels 1 • number leaf blocks/key 1
Available information to CBO • Initialization parameters that influence CBO cost computation (partial list …) • db_file_multiblock_read_count • hash_multiblock_io_count • hash_area_size • sort_area_size • bitmatp_merge_area_size
Available information to CBO • Parameters affecting Cost computation • sort_multiblock_read_count • optimizer_index_caching • optimizer_index_cost_adj • optimizer_percent_parallel • optimizer_mode (choose defaulted to all_rows) • First_row applies a heuristic bias to the cost model to promote the use of indexes and nested loop)
CBO: Cost of a SQL statement • Cost is the estimated number of I/O, CPU, Network operations that a statement requires. • CPU cost (parse) has little impact on a SQL tuning on most cases. While I/O is the most. • Difference between logical and physical I/O • It is affected by some parameters: • db_file_multiblock_read_count, sort_area_size, hash_area_size, hash_multiblock_io_count, bitmap_merge_area_size
Cost of Accessing Data • Table scan cost • Number of blocks below HWM • Multiblock read factor (default 8) • Number of extents is also taken into consederation • Example • Cost: 103 / 8 = 12.875 => 13
Cost of Accessing Data • Index Scan Costs are based on: • Index access • number or levels in the B*-tree • number of leaf blocks to examine • Consequent table lookup (optional) • number of blocks accessed • Cost Fast full scan depends on db_file_multiblock_read_count
Cost of Sorting • Data may need to be sorted for: • Order by • aggregation • Join operation • Sorts are typically CPU intensive, and can be I/O bound if the sort can not fit in memory. • Cost depends on sort_area_size and # rows.
Cost of a Join (SM) • Rows from row source 1 are sorted • Rows from row source 2 are then sorted by the same sort key • Sorted rows from both sides are then merged • Cost: sorting, reading tables, I/O for temporary segments MERGE Sort Sort Row source 1 Row source 2
Cost of a Join (NL) Outer Loop • Row source 1 is scanned (outer /driving table) • Each row returned drives a lookup in row source 2 (inner) • Joining rows are then returned • Cost: Read driving table and access on inner table. • Performance is very dependent on index on inner table Inner Loop Check for a match Nested Loop Access A (Full) Access B (ROWID) Index Access
Cost of a Join (HJ) Row source 1 (build input) Row source 2 (probe) • In theory, it is the most efficient joint method. • The smaller row source is used to build a hash table and a bitmap • The second row source is hashed and checked against the hash table • The bitmap is used as a quick lookup to check if rows are in the hash table. • It requires single pass for each row source , and more efficient than sorting and merging Hash table and bitmap filter in MEMORY Output rows DISK
Join Order Evaluation • The initial permutation is generated by sorting the join order in ascending order of their computed cardinality, as listed in where clause predicates. • For each permutation, compute the cost and keep the one with the lowest cost. • At any time, keep the current and the best permutation so far. • Defaulted to 80,000 permutation • Most of times the order will not matter, but sometimes it does matter. (running out of permutation limits ?)
Common CBO problem • The skewness problem 30% • Analyzing with wrong data 25% • Mixing optimizer in joins 20% • Choosing inferior index 20% • Joining too many tables < 5% • Incorrect INIT.ORA settings < 5%
Why use hints ... • Sometimes the optimizer may give an optimal plan. Hint is supplied as a directive and may get ignored. • Developers know more about the data. • Hints can be used to influence: • The optimization approach (RULE, ALL_ROWS, FIRST_ROWS) • The access path for a table accessed (FULL , INDEX,HASH, ...) • The join order and method (ordered, leading, star, …use_nl, use_merge, use_hash ...)
Hint: Optimization approach • CHOOSE defaulted to ALL_ROWS • “BEST” plan may mean: • Use minimal resource to process all rows affected by the statement ===> ALL_ROWS. (prefer SMJ) • Returns the first row of a statement as quickly as possible ===> FIRST_ROWS. (prefer NLJ) • Optimizer has limited information available to determine the “BEST” plan.
Choosing which optimization • System level, session level , statement level. • First_rows returns the first rows as quick as possible. • All_rows is for optimal throughput, batch oriented application. • Rule, some cases outperform CBO on complex queries • First_rows(xxx) or First_rows _nnn , new on Oracle 9i
FULL ROWID CLUSTER INDEX INDEX_ASC INDEX_DESC INDEX_JOIN INDEX_FFS NO_INDEX AND_EQUAL Hint: Access Paths
USE_CONCAT NO_EXPAND REWRITE MERGE NO_MERGE START_TRANSFORMATION FACT NO_FACT Hint: Query Transformation
ORDERED STAR USE_NL USE_MERGE USE_HASH DRIVING_SITE LEADING HASH_AJ, MERGE_AJ, NL_AJ HASH_SJ, MERGE_SJ, NL_SJ HINT: Join Orders and Join Operation