330 likes | 496 Views
Are Your Statistics Bad Enough?. Verify the effectiveness of gathering optimizer statistics Jaromir D.B. Nemec UKOUG 2005. 27.10.2005. Gathering Statistics. The way from ANALYZE TABLE to DBMS_STATS Increasing number of parameters and options changing defaults CBO more intelligent
E N D
Are Your Statistics Bad Enough? Verify the effectiveness of gathering optimizer statistics Jaromir D.B. Nemec UKOUG 2005 27.10.2005 Are Your Statistics Bad Enough?
Gathering Statistics • The way from ANALYZE TABLE to DBMS_STATS • Increasing number of parameters and options • changing defaults • CBO more intelligent • a “small change” can have a big effect • . . . and all that stuff only to get • some redundant information Are Your Statistics Bad Enough?
Complexity Trap In the chain of dependent objects the quantity is increasing Are Your Statistics Bad Enough?
Are My Statistics Adequate? Way 1 - observe (they are not if you encounter problems) This is reactive approach Way 2 - test it (define test cases and check them) define invariant the invariant must respect the dynamic of the database! Are Your Statistics Bad Enough?
Basis Scenario • Motto: fix the problem in its origin • Limit the scope to the basic table access only • (full scan, index access) • This reduces the complexity significantly and • can expose the real origin of problems in the execution plan Are Your Statistics Bad Enough?
Simple Example select * from ta, tb where ta.rand = tb.rand and ta.norm = 141 and ta.rand <= 770 and (tb.norm = 1 or tb.norm = -1) --------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------- | 0 | SELECT STATEMENT | | 65073 | |* 1 | HASH JOIN | | 65073 | |* 2 | TABLE ACCESS BY INDEX ROWID| TA | 2771 | |* 3 | INDEX RANGE SCAN | TA_INDEX | 3598 | |* 4 | TABLE ACCESS FULL | TB | 23580 | --------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TA"."RAND"="TB"."RAND") 2 - filter("TA"."RAND"<=770) 3 - access("TA"."NORM"=141) 4 - filter(("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770) Are Your Statistics Bad Enough?
Verify it! Re-estimate • For each table access • Get owner, object name and alias • Option FULL – get filter predicate • Option (b-tree) INDEX – get access and filter predicate on table and index line • enclose in brackets and connect with AND • form a re-estimation query • execute the query Are Your Statistics Bad Enough?
Re-estimation – how does it work? Input from execution plan |* 2 | TABLE ACCESS BY INDEX ROWID| TA | 2771 | |* 3 | INDEX RANGE SCAN | TA_INDEX | 3598 | |* 4 | TABLE ACCESS FULL | TB | 23580 | --------------------------------------------------------- 2 - filter("TA"."RAND"<=770) 3 - access("TA"."NORM"=141) 4 - filter(("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770) Generated re-estimation query select count(*) from "TB" where (("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770) ; -- gives 60317 rows select count(*) from "TA" where ("TA"."NORM"=141) and ("TA"."RAND"<=770) ; -- gives 1 row Run the re-estimation query to get the real cardinality, compare it with the original estimation Are Your Statistics Bad Enough?
Re-estimation – Summary Run the re-estimation query to get the real cardinality, compare it with the original estimation Note, that the TA access cardinality is overestimated The TB access cardinality is underestimated Are Your Statistics Bad Enough?
Re-estimation Basic Idea • The simplification is based on the assumption that all problems in the execution plan are caused directly or indirectly by the cardinality assumption in the base table access. • Examples • Cardinality underestimation can lead to NL join with big inner table • Cardinality overestimation can switch a NL join to a SORT MERGE join • improper assumption of cardinality 1 can lead to CARTESIAN join Are Your Statistics Bad Enough?
Access and Filter Predicates Access Predicate – controls the index access Filter Predicate – additional filter condition of index of table Availability of predicates Are Your Statistics Bad Enough?
Hypothesis • By selecting a limited set of objects and corresponding predicates we can try to find an answer to some fundamental questions: • are my statistics precise enough? • are my statistics up-to-date? • does a histogram help? • is the sample size OK? • is dynamic sampling relevant? • is the partition design suited for the application? • ready for migration? Are Your Statistics Bad Enough?
Are My Statistics Precise Enough? Comparing the estimated and re-estimated cardinality we obtain the relative and absolute difference Predicate - where (("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770) ACCE EST_CARD RE_EST_CARD ABS_DIFF REL_DIFF---- ---------- ----------- ---------- ----------same 23580 60317 36737 61Predicate - where ("TA"."NORM"=141) and ("TA"."RAND"<=770) ACCE EST_CARD RE_EST_CARD ABS_DIFF REL_DIFF---- ---------- ----------- ---------- ----------same 2771 1 -2770 -277000 Are Your Statistics Bad Enough?
Are My Statistics Up-to-date? By observing the historical view we obtain the dynamic aspect of predicate selectivity Are Your Statistics Bad Enough?
Processing Two basic steps are extracting and processing of the predicate information for execution plans example operative task administrative task Are Your Statistics Bad Enough?
Simple Model Are Your Statistics Bad Enough?
Does a Histogram Help? • Histogram candidates: • column is used in predicate • bad estimation • access with different values • real cardinality is in wide range Test the influence of histogram size on the precision of selectivity estimation using re-estimation queries. Are Your Statistics Bad Enough?
Histogram / 2 Target precision Without a histogram (SIZE=1) the cardinality is underestimated (average). If the histogram size is higher than the number of distinct values the estimation is very precise. For columns with a high number of distinct values the estimation remains imprecise. Are Your Statistics Bad Enough?
Sample Size • Re-estimate a predicate on statistics gathered with different estimate_percent parameter • observe the influence of the sample size • Possible results: • relatively small percentage returns good results • when increasing estimate percent the error is reduced • there is marginal effect of the sample size on the precision • Q: is there one optimal single sample size for an object? Are Your Statistics Bad Enough?
Re-estimation and Dynamic Sampling • add a dynamic sampling hint to the re-estimation query • trace with 10053 event • select /*+ dynamic_sampling(tc,2) */ * • from tc where norm300 = 0; • Generated DS query • SELECT /* some hints */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM • (SELECT /*+ FULL("TC") some other hints */ 1 AS C1, • CASE WHEN "TC"."NORM300"=0 THEN 1 ELSE 0 END AS C2 • FROM "TC" SAMPLE BLOCK (1.076923 , 1) SEED (1) "TC" • ) SAMPLESUB Total rows in sample Sample size controlled by level rows satisfying the predicate Are Your Statistics Bad Enough?
Re-estimation and Dynamic Sampling / 2 Results of DS query level : 2 sample pct. : 50.806452 actual sample size : 54147 filtered sample card. : 6018 orig. card. : 111358 single table dynamic sel. est. : 0.11114189 DS blocks / table blocks Total rows in sample rows satisfying the predicate Computed selectivity filtered / actual rows (= satisfying / total rows) Are Your Statistics Bad Enough?
Re-estimation and Dynamic Sampling / 3 • Dynamic sampling has a neat positive side effect • Level 10 performs exactly the re-estimation • no need to execute the re-estimation query, parsing with DS level 10 returns exact result • Lower levels of DS are relatively cheap and can give some hints Are Your Statistics Bad Enough?
Dynamic Sampling – Levels Dynamic sampling levels are absolute (except for level 10) Block size 8KB Even for a moderate table the estimate percent of DS could be too low DS uses block sampling only Are Your Statistics Bad Enough?
Re-estimation – Final Approach • For a selected predicate • build re-estimation query • parse it with different dynamic sampling hints (e.g. 0, 2, 5 and 10) • peek in the PLAN_TABLE and get the cardinality and access option Predicate where part_key = 3 and GROW_SHIF > 0 and shif = 0 ACCE EST_CARD EST_DS2_CARD EST_DS5_CARD RE_EST_CARD DIFF DIFF_PREC ---- ---------- ------------ ------------ ----------- ---------- ---------- same 7982 94 116 89 -7893 -8869 Predicate where part_key = 2 and GROW = 100 ACCE EST_CARD EST_DS2_CARD EST_DS5_CARD RE_EST_CARD DIFF DIFF_PREC ---- ---------- ------------ ------------ ----------- ---------- ---------- same 4281 13 2 0 -4281 -428100 Are Your Statistics Bad Enough?
Estimation of Lower Levels of Dynamic Sampling • Two extreme cases can be observed while re-estimation on lower levels of DS. • Dynamic sampling is extremely precise even on lower levels • Dynamic sampling doesn’t “work” until a high level Real Cardinality EstimatedCardinality DS Level 0 5 10 Are Your Statistics Bad Enough?
Evolution of Dynamic Sampling Consider that dynamic sampling itself is subject of evolution dynamic sampling on a predicate with index: 9i rel 2 – FULL TABLE scan with SAMPLE clause 10g rel 1 – INDEX ACCESS with ROWNUM <= 2500 10g rel 2 – FULL TABLE scan with SAMPLE clause Are Your Statistics Bad Enough?
Re-estimation and Partitioning • Constant pstart and pstop values • Add predicates for range and list partitions / subpartitions • --------------------------------- ----------------- • | Operation | Name | | Pstart| Pstop | • --------------------------------- ----------------- • | SELECT STATEMENT | | | | | • | PARTITION RANGE SINGLE| | | 2 | 2 | • | PARTITION LIST SINGLE| | | 3 | 3 | • | TABLE ACCESS FULL | T2S | | 6 | 6 | • ---------------------------------- ---------------- • Verify granularity parameter • Verify partition pruning Are Your Statistics Bad Enough?
Predicates in Parallel Plans For parallel execution plans no additional processing by generating re-estimation queries is required | Id | Operation | Name | Rows | ---------------------------------------------------- | 0 | SELECT STATEMENT | | 82647 | | 1 | PX COORDINATOR | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 82647 | |* 3 | HASH JOIN | | 82647 | | 4 | PX BLOCK ITERATOR | | 3519 | |* 5 | TABLE ACCESS FULL | TA | 3519 | | 6 | BUFFER SORT | | | | 7 | PX RECEIVE | | 23580 | | 8 | PX SEND BROADCAST | :TQ10000 | 23580 | |* 9 | TABLE ACCESS FULL| TB | 23580| Are Your Statistics Bad Enough?
Bitmap Index Access Generating of re-estimation queries for bitmap index access must respect the access logic 1 | TABLE ACCESS BY INDEX ROWID | T1 | 13 | 2 | BITMAP CONVERSION TO ROWIDS| | | 3 | BITMAP AND | | | 4 | BITMAP INDEX SINGLE VALUE| IND2_T1 | | 5 | BITMAP INDEX SINGLE VALUE| IND1_T1 | | Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."GROUP2_ID"=28) 5 - access("T1"."STATUS"='X') Generated predicate ("T1"."GROUP2_ID"=28) AND ("T1"."STATUS"='X') The bitmap index cardinality is missing in the execution plan Are Your Statistics Bad Enough?
Ready for Upgrade or Crash? How can the behavior of an application in migrated environment be predicted? A sophisticated but expensive method is to compare execution plans Re-estimation method can be used as a lightweight timely warning system Are Your Statistics Bad Enough?
Some Limitations Limited usage on predicates with bind variables and parse time partition pruning Predicates stored in ACCESS_PREDICATES and FILTER_PREDICATES are not always complete A single predicate is limited to the length of 4000 bytes Further Work Weighted function for the result of re-estimation Relevance to SQL profile Are Your Statistics Bad Enough?
Summary • Method to verify the “status” of database statistics • Introduction of “predicate mining” • A green bar for database statistics • Dynamic sampling may be used as a verification method Are Your Statistics Bad Enough?
References Christian Antognini, CBO: A Configuration Roadmap, Hotsos Symposium 2005 Wolfgang Breitling, Using DBMS_STATS in Access Path Optimization, UKOUG 2004 Jonathan Lewis, Strategies for Statistics, UKOUG 2004 metalink 72539.1 Interpreting Histogram Information 114671.1 Gathering Statistics for the Cost Based Optimizer 236935.1 Global statistics - An Explanation Are Your Statistics Bad Enough?