Managing Statistics for Optimal Query Performance

Managing Statistics for Optimal Query Performance Karen Morton karen.morton@agilex.com OOW 2009 2009 October 13 1:00pm-2:00pm Moscone South Room 305

Your speaker… • Karen Morton • Sr. Principal Database Engineer • Educator, DBA, developer, consultant, researcher, author, speaker, … • Come see me… • karenmorton.blogspot.com • An Oracle user group near you

“I accept no responsibility for statistics, which are a form of magic beyond my comprehension.” — Robertson Davies

Math or Magic ?

Pick any black card Move UP or DOWN to the nearest red card Move LEFT or RIGHT to the nearest black card Move DIAGONALLY to the nearest red card Move UP or DOWN to the nearest black card

Your card is…

Math or Magic ?

SQL>desc deck Name Null? Type ------------- -------- ------------- SUIT NOT NULL VARCHAR2(10) CARD VARCHAR2(10) COLOR VARCHAR2(5) FACEVAL NOT NULL NUMBER(2)

Table: DECK Statistic Current value --------------- ------------------- # rows 52 Blocks 5 Avg Row Len 20 Degree 1 Sample Size 52 Column Name NDV Nulls # Nulls Density Length Low Value High Value ----------- --- ----- ------- ------- ------ ---------- ----------- SUIT 4 N 0 .250000 8 Clubs Spades CARD 13 Y 0 .076923 5 Ace Two COLOR 2 Y 0 .500000 5 Black Red FACEVAL 13 N 0 .076923 3 1 13 Index Name Col# Column Name Unique? Height Leaf Blks Distinct Keys -------------- ----- ------------ ------- ------ ---------- ------------- DECK_PK 1 SUIT Y 1 1 52 2 FACEVAL DECK_CARD_IDX 1 CARD N 1 1 13 DECK_COLOR_IDX 1 COLOR N 1 1 2

Cardinality The estimated number of rows a query is expected to return. number of rows in table x predicate selectivity

select * from deck order by suit, faceval ; Cardinality 52 x 1 = 52

SQL>select * from deck order by suit, faceval ; 52 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3142028678 ---------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 52 | 1040 | 2| | 1 | TABLE ACCESS BY INDEX ROWID| DECK | 52 | 1040 | 2| | 2 | INDEX FULL SCAN | DECK_PK | 52 | | 1| ---------------------------------------------------------------------- *

select * from deck where color = 'Black' ; Cardinality 52 x 1/2 = 26 *

SQL>select * from deck where color = 'Black' ; 26 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1366616955 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 26 | 520 | 2| | 1 | TABLE ACCESS BY INDEX ROWID| DECK | 26 | 520 | 2| |* 2 | INDEX RANGE SCAN | DECK_COLOR_ID | 26 | | 1| ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("COLOR"='Black') *

select * from deck where card = 'Ace' and suit = 'Spades' ; Cardinality 52 x 1/13 x 1/4 = 1 *

SQL>select * 2 from deck 3 where card = 'Ace' 4 and suit = 'Spades' ; 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2030372774 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 20 | 2| |* 1 | TABLE ACCESS BY INDEX ROWID| DECK | 1 | 20 | 2| |* 2 | INDEX RANGE SCAN | DECK_CARD_IDX | 4 | | 1| ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("SUIT"='Spades') 2 - access("CARD"='Ace') *

select * from deck where faceval > 10 ; (13 – 10) (13 – 1) High Value - Predicate Value High Value - Low Value Cardinality 52 x = 13 *

SQL>select * 2 from deck 3 where faceval > 10 ; 12 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1303963799 --------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | --------------------------------------------------------- | 0 | SELECT STATEMENT | | 13 | 260 | 3| |* 1 | TABLE ACCESS FULL| DECK | 13 | 260 | 3| --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("FACEVAL">10) *

select * from deck where card = 'Ace' ; Cardinality 52 x 1/13 = 4 *

SQL>select * from deck where card = :b1 ; 4 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2030372774 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 80 | 2| | 1 | TABLE ACCESS BY INDEX ROWID| DECK | 4 | 80 | 2| |* 2 | INDEX RANGE SCAN | DECK_CARD_IDX | 4 | | 1| ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CARD"=:B1) *

Math or Magic ? Maybe it's a little bit of both!

What's the best method for collecting statistics?

It depends.

Statistics that don't reasonably describe your data

…lead to poor cardinality estimates

…which leads to poor access path selection

…which leads to poor join method selection

…which leads to poor join order selection

…which leads to poor SQL execution times.

Statistics matter!

Automatic vs Manual

Automatic Collections

Objects must change by at least 10%

Collection scheduled during nightly maintenance window

dbms_stats gather_database_stats_job_proc

Prioritizes collection in order by objects which most need updating

Most functional when data changes at a slow to moderate rate

Volatile tables and large bulk loads are good candidates for manual collection

Automatic doesn't mean accurate (for your data)

Automatic Collection Defaults

SQL>exec dbms_stats.gather_table_stats (ownname=>?, tabname=>?) ; partname NULL cascade DBMS_STATS.AUTO_CASCADE estimate_percent DBMS_STATS.AUTO_SAMPLE_SIZE stattab NULL block_sample FALSE statid NULL method_opt FOR ALL COLUMNS SIZE AUTO statown NULL degree 1 or value based on number of CPUs and initialization parameters force FALSE granularity AUTO (value is based on partitioning type) no_invalidate DBMS_STATS.AUTO_INVALIDATE

cascade=> AUTO_CASCADE Allow Oracle to determine whether or not to gather index statistics

estimate_percent=> AUTO_SAMPLE_SIZE Allow Oracle to determine sample size

method_opt=> FOR ALL COLUMNS SIZE AUTO Allow Oracle to determine when to gather histogram statistics SYS.COL_USAGE$

no_invalidate=> AUTO_INVALIDATE Allow Oracle to determine when to invalidate dependent cursors

Goal Collect statistics that are "good enough" to meet most needs most of the time.

Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you would be perfectly comfortable. – Bobby Bragan

Manual Collections

dbms_stats gather_*_stats * = database, schema, table, index, etc.

Is it common for your users to get slammed with performance problems shortly after statistics are updated?

Managing Statistics for Optimal Query Performance