600 likes | 610 Views
Guiding Practices for Gathering Database Statistics. Martin Widlake Database Architecture, Performance & Training Ora600 Limited. http://mwidlake.wordpress.com www.ora600.org.uk. Guiding practices for Database Statistics gathering
E N D
Guiding Practices for Gathering Database Statistics Martin Widlake Database Architecture, Performance & Training Ora600 Limited http://mwidlake.wordpress.com www.ora600.org.uk ORA00 Ltd
Guiding practices for Database Statistics gathering The Cost Based Optimizer continues to improve and stats gathering is now more efficient than ever - but it still seems to be that most Oracle Sites struggle with performance issues due to poor stats. It's like the annoying, embarrassing rash that simple won't go away. I will cover the options available and general principles for sorting out the stats issue, which should lead to more stable and good performance ie a more comfortable life . This should calm the annoying rash and give you some potential treatments should it flare up again. Abstract ORA600 Ltd
20+ years of Oracle experience, mostly as a developer, development DBA, Architect, Performance guy. Who am I and why am I doing thisTalk? • Tested the CBO under V7.3 and became a cautious advocate of it in V8. Been fighting the issues since! • I keep getting pulled into designing “better” methods of gathering stats for clients and, frankly, I’d rather do other things {thus the presentations and blog posting telling everyone what I know}. • I am of the opinion that, over all the CBO now gets 99% of SQL execution plans good enough, if the stats are good. • Stats gathering can be quite interesting. Honest! ORA600 Ltd
These slides will be on theUKOUG web siteI am going to talk around some slides (theones with pictures on and key points) and skip over some - as we all get tired of reading powerpoint slides in presentations.The others are there to fill in the chat.Ask questions, Email memwidlake@btinternet.commwidlake.wordpress.com ORA600 Ltd
What is the most common version of Oracle you currently use? (8, 9, 10.1, 10.2, 11.1, 11.2) Quick Quiz • What is the latest version you use in production? • Who relies on the Automated Stats Collection job on their database? (If “Yes”, have you altered the schedule?) • Who has intermittent performance issues when code goes bad either “over night” or after stats are collected? • Who has a site-crafted stats gathering regime? • (If your site wrote it’s own, did it take 2X, 4X, 8X or more the effort to get right than you expected?). ORA600 Ltd
Poor or missing object statistics are probably the single most common and easily fixed cause of poor database performance. Possibly the Single Most Common Cause of Poor Database Performance In my opinion, the introduction of the automated stats gathering process with Oracle 10g was probably the single greatest performance enhancement by Oracle Corp In the last 15 years. Most issue with individual SQL statements performing poorly are fixed by gathering accurate statistics on the tables involved. The worst of all situations is to have statistics on a few tables. The Cost Based Optimiser is invoked and has to use very poor defaults for everything else. And I don’t really like it Ora 600 Ltd
Automatic Stats Collection Auto Stats Job Preparation FLUSH_DATABASE_ MONITORING_INFO Global and Table Prefs (stale_pct,est_pct, met_op. Degree..) DBA_TAB_MODIFICATIONS (10% BY DEFAULT) SYS.COL_USAGE$ Data Dictionary Information Existing Statistics OBJ_FILTER_LIST (STALE AND EMPTY) Runs gathers In the scheduled window ORA600 Ltd
From the 10g Tuning guide:- The GATHER_DATABASE_STATS_JOB_PROC procedure collects statistics on database objects when the object has no previously gathered statistics or the existing statistics are stale because the underlying object has been modified significantly (more than 10% of the rows).The DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC is an internal procedure, but its operates in a very similar fashion to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The primary difference is that the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure prioritizes the database objects that require statistics, so that those objects which most need updated statistics are processed first. This ensures that the most-needed statistics are gathered before the maintenance window closes. GATHER_DATABASE_STATS_JOB_PROC That last sentence is the only major change in the 11g documentation. Ora 600 Ltd
If it works for you, then fine, leave it be and work on something else. If it almost works for you, fix the exceptions, leave the main job alone and work on something else. Automated DBMS_STATS Job • If you have a VLDB (or you downloaded this as you had an issue with stats gathering) It is almost certainly not good enough for you. • It is an attempt at a single solution to work for every situation and it does not. Even Oracle Corp have admitted, it just simply does not work for VLDBs, it chokes on large objects • Turn it off (maybe leave it running for DICTIONARY stats) and write the replacement. You will write something that does a lot of what this job does. Your replacement will almost certainly be more complex than you initially plan. Sorry. • There is no one single solution to stats gathering that is right for any large, complex system. {Sorry again} ORA600 Ltd
All inserts, updates, deletes and truncate operations on monitored tables are flushed to this table. So V10 upwards, that is everything. DBA _TAB_MODIFICATIONS • Under V9 flushed every 3 hours, under V10.1 every 15 minutes, under V10.2/V11 it is not automatically flushed. • It is flushed to by calls to schema/db dbms_stats GATHER calls or by calling DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO • It does not capture direct inserts, appends, things that avoid the SQL layer. It seems generally accurate but I have witnessed it missing the odd “insert-into-select” statements. And it does not see direct-insert activity. • Increments, including over database restarts. • Row is deleted when stats are gathered on the Table OR PARTITION (and only at the correct level). Ora 600 Ltd
show_tab_mods -- show_tab_mods -- Martin Widlake 11-nov-07 -- quick check on recent table changes -- NB flush if want up-to-date info (15 min interval) set lines 100 pause on col obj_name form a50 col no_ins form 9999,999 col no_upd form 9999,999 col no_del form 999,999 select table_owner||'.'||table_name||'-'||partition_name obj_name ,inserts no_ins ,updates no_upd ,deletes no_del ,substr(truncated,1,1) T ,to_char(timestamp,'YYMMDD hh:mm:ss') last_flush from dba_tab_modifications where timestamp > sysdate -31 and table_owner not like 'SYS%' order by timestamp desc / clear colu OBJ_NAME NO_INS NO_UPD NO_DEL T LAST_FLUSH --------------------------------------- --------- --------- ------ - --------------- MIDDLEOFFICE.POSITIONLIQUIDATIONAUDIT- 9,588 0 0 N 080602 10:06:03 MIDDLEOFFICE.POSITIONKEEPINGGROUPS- 2 1 0 N 080602 10:06:03 GATEWAY.TREE_RELATIONS- 44 0 43 N 080602 10:06:03 COMMHOME.COMM_TRAD_PCTTRAD_BAND- 4 0 4 N 080602 10:06:03 GATEWAY.INSTRUMENT_INFO- 2 1 0 N 080602 10:06:03 GATEWAY.TRADINGPERIODPROFILEDATA- 1 0 1 N 080602 10:06:03 BOBJ_LOGIN.ORDERS-ORDERS_12345678 1,066 9,381 0 N 080531 06:05:03 DATAFIX.TRADEORD_VOLT- 2 0 0 N 080531 06:05:03 BOBJ_LOGIN.ORDERS_HISTORY-OH_54545454 9,379 0 0 N 080531 06:05:03 DATAFIX.MAXORDERS- 2 0 0 N 080531 06:05:03 BOBJ_LOGIN.ORDERAUDIT-ORDERAUDIT23 1375610 0 0 N 080530 10:05:02 COMMHOME.COMM_ON_DEPOSIT_H- 4 0 0 N 080530 10:05:02 GATEWAY.BIN$TnRrf2V98FrgQwrckArwWg==$0- 95,449 0 0 N 080530 10:05:02 GATEWAY.BIN$TnRrf2V38FrgQwrckArwWg==$0- 882 0 0 N 080530 10:05:02 GATEWAY.DEALINGRECNMSAI- 290 5,052 0 N 080530 10:05:02 GATEWAY.BIN$TnRkBv02UezgQwrckApR7A==$0- 16,913 0 0 N 080530 10:05:02 Ora 600 Ltd
-- mdw 11/05/03 -- mdw 17/01/08 Modified to look at dba_tab_modifications set pause on pages 24 lines 110 pause 'Any Key>' colu anlyzd_rows form 99999,999,999 colu tot_rows form 99999,999,999 colu tab_name form a30 colu chngs form 99,999,999 colu pct_c form 999.999 select dbta.owner||'.'||dbta.table_name tab_name ,dbta.num_rows anlyzd_rows ,to_char(dbta.last_analyzed,'yymmdd hh24:mi:ss') last_anlzd ,nvl(dbta.num_rows,0)+nvl(dtm.inserts,0) -nvl(dtm.deletes,0) tot_rows ,nvl(dtm.inserts,0)+nvl(dtm.deletes,0)+nvl(dtm.updates,0) chngs ,(nvl(dtm.inserts,0)+nvl(dtm.deletes,0)+nvl(dtm.updates,0)) /greatest(nvl(dbta.num_rows,0),1) pct_c from dba_tables dbta left outer join dba_tab_modifications dtm on dbta.owner = dtm.table_owner and dbta.table_name = dtm.table_name and dtm.partition_name is null where dbta.table_name like upper(nvl('&Tab_name','WHOOPS')) / clear colu TAB_NAME ANLYZD_ROWS LAST_ANLZD TOT_ROWS CHNGS PCT_C -------------------------- ----------- --------------- ----------- -------- ------ GATEWAY.ACCESSGROUPS 4 080212 16:03:12 4 0 .000 MIDDLEOFFICE.ACCOUNTAUDIT 4,725,464 080512 22:37:16 4,898,302 173,738 .037 GATEWAY.ACCOUNTBEHAVIOURS 14 080212 16:03:18 14 0 .000 GATEWAY.ENBZHEERHWV 149,136 080522 22:06:39 150,922 6,156 .041 DATAFIX.VRHERHHHEHH_08_RBK 17,650 080425 22:00:04 17,650 0 .000 DATAFIX.AFEOUNSFEWS_190505 42,757 071105 22:01:00 42,757 0 .000 DATAFIX.ACSFFWEGGEE_310108 182 080131 22:00:03 182 0 .000 MM_AUDIT.EFEFEFGHOME_AUDIT 513,230 080509 22:07:34 526,192 12,962 .025 DATAFIX.AEGFWSCEFOE_TEMP 10,000 071105 22:00:52 10,000 0 .000 GATEWAY.AEFSSTEFFFE 2,083 080212 16:03:36 2,08 0 .000 MM_AUDIT.ACCOUNSGSEGEGGEIT 2,083 071105 22:00:21 2,083 0 .000 MIDDLEOFFICE.ASGESGEPOTALLOCAT 123,944 080602 22:00:40 123,944 0 .000 GATEWAY.ENBZHEERHWV 39,136 080522 22:06:39 44,922 6,156 .131 DATAFIX.RWERNGNONNGNVZBYNO 17,650 080425 22:00:04 17,650 0 .000 DATAFIX.WEGEBTRUUUXTDWTTHH 32,707 071105 22:01:00 32,707 0 .000 DATAFIX.ACGERTIIJJYKYNJNIY 189 080131 22:00:03 189 0 .000 MM_AUDIT.KUKLYUTDJJ4YUNYJJ 313,230 080509 22:07:34 426,192 112,962 .251 DATAFIX.AERGERHGGRE_TEMP 10,000 071105 22:00:52 10,000 0 .000 GATEWAY.AEFSSTEFFFE 2,083 080212 16:03:36 15,231 0 .000 Ora 600 Ltd
Every time a SQL statement is parsed, information about columns referenced in table joins and where predicates is stored in the internal table SYS.COL_USAGE% SYS.COL_USAGE$ • This is what DBMS_STATS uses to help decide which of the indexed columns to gather stats on when method_opt “for all indexed columns” is used. • It might also play a part in deciding which columns to gather histograms on, as I have tested adding very skewed columns to a table and the automatic stats collection does not gather histograms and neither does a specific call to gather_stats with method_opt=“auto”. • It can also be useful to use, to help identify if an index is missing or even is likely to be used Ora 600 Ltd
OWNER TAB_NAME COLUMN_NAME --------------- ------------------------- --------------------------- EQUAL_PREDS EQI_JOINS NONEGI_JNS RANGE_PRDS LIKE_PRDS NULL_PRDS TS ----------- --------- ---------- ---------- --------- --------- ------------- COMMHOME COMM_TRAD_PCTTRAD_H TREENODEID 2 0 0 0 0 0 02 JUN 2011 12:44:22 COMMHOME COMM_TRAD_PIPREFUND CC 0 3 0 0 0 0 02 JUN 2011 10:59:13 COMMHOME COMM_TRAD_PIPREFUND HOMEID 4,581 10 0 0 0 0 15 JUN 2011 20:38:44 COMMHOME COMM_TRAD_PIPREFUND ISLEAF 163 0 0 0 0 0 15 JUN 2011 20:38:44 COMMHOME COMM_TRAD_PIPREFUND TREENODEID 177 169 0 0 0 0 15 JUN 2011 20:38:44 SYS TS$ FLAGS 3,102 0 0 0 0 0 14 JUN 2011 23:33:37 SYS TS$ NAME 2,708 1,954 5 0 392 0 15 JUN 2011 12:22:23 SYS TS$ ONLINE$ 3,553 0 0 0 0 0 15 JUN 2011 06:38:16 SYS TS$ TS# 1,132 8,555 0 86 0 0 15 JUN 2011 18:08:32 GATEWAY TRADINGPERIODPROFILEDATA TRADINGCLOSE 0 0 0 1 0 0 29 MAR 2011 05:23:45 GATEWAY TRADINGPERIODPROFILES INSTGROUPID 46 237 0 0 0 0 12 JUN 2011 11:09:17 GATEWAY TRADINGPERIODPROFILES PROFILENAME 0 67 0 0 1 0 12 JUN 2011 00:53:36 GATEWAY TRADINGPERIODPROFILES SOURCEID 44 234 0 0 0 0 12 JUN 2011 11:09:17 -- chk_col_usage-- this is a rip-off of Tim Gormans' script to look at the column usage info that,in-- 9i,10g and 11 beta at least, is not revealed in a DB view. Gitscol owner form a22 wrapcol tab_name form a30 wrapcol column_name form a30 wrapcol equal_preds form 9999,999col eqi_joins form 9999,999col noneqi_jns form 9999,999col range_prds form 9999,999col like_prds form 9999,999col null_prds form 9999,999select oo.name owner, o.name tab_name, c.name column_name, u.equality_preds equal_preds, u.equijoin_preds eqi_joins, u.nonequijoin_preds nonegi_jns, u.range_preds range_prds, u.like_preds like_prds, u.null_preds null_prds, u.timestamp tsfrom sys.col_usage$ u, sys.obj$ o, sys.user$ oo, sys.col$ cwhere o.obj# = u.obj#and oo.user# = o.owner#and c.obj# = u.obj#and c.col# = u.intcol#and o.name like upper(nvl('&tab_name','%'))||'%'and oo.name like upper(nvl('&tab_own','%'))||'%'order by 1,2,3/clear colu Ora 600 Ltd
Statistics Hierarchy There is more than one type of “stats” that Oracle can gather and which have different impacts and are best gathered in different ways. SYSTEM STATISTICS Gather “once” – how fast the hardware is. Multi-Block read : Single-Block +Speed of your COU Increasing Impact FIXED OBJECT STATISTICS Gather “once” – How big your memory objects are Areas of memory, number of users, size of caches X$ sys-only “objects” Gather regularly, probably via auto stats job Do not enhance or do one-offs Essentially “normal” stats for sys.obj$-type things DICTIONARY STATISTICS Auto Gather Gather regularly, via auto job and enhancements Tables, Indexes, Columns What DBAs/Developers mean by “Stats” OBJECT STATISTICS
In effect, these stats are just the CPU speed and the relative speeds of Single-Block Reads (SBR) and Multi-Block Reads (MBR). System Statistics • The actual speed of single- and multi- block reads are recorded, in milliseconds, but it is the ratio between them that counts. • If Multi-Block reads are found to be the same or faster than Single-Block reads, Oracle 10 ignores the data collected, does not store it. • The CBO converts all IO and CPU cost into units of single-block reads. That is what the COST is in explain plan. It is also what you see in AWR. • Gathering System Statistics may: • push oracle towards or away from high-CPU actions like sorts. • Alter the likelihood of full table scans and fast full index scans as oracle better understands the cost of the multi-block actions. ORA600 Ltd
V10 and 11 come with a default set of system statistics. You can GATHER_SYSTEM_STATS with a fake workload or based on activity on your system over a period of time. I advise the latter – but ensure your system has a representative workload. System Statistics • You only need to gather the System Statistics “once” (but ensure you do so with an “average load”). • Re-gathering is only required if your storage changes (eg add more spindles), if there is a major system change or the server(s) you use change significantly in CPU utilisation • Gathering at day and night and storing/swapping system stats is often suggested – but seems to be a bit of an urban myth. • You may wish to gather system stats 4 or 5 times and DBMS_STATS.SET_SYSTEM_STATS to the average. • NB Not RAC aware – system stats gathered on one node apply to all ORA600 Ltd
These are statistics on the in-memory “dynamic performance” objects, the x$ and similar tables (what V$ views sit on). Fixed Object Statistics • Need to gather “once” and only re-gather if something significant changes such as allocating much more memory to the instance or the number of user sessions greatly increasing. • Gathering Fixed Object stats will aid internal SQL, checking session details, looking at memory constructs. I have seen a small improvement in parse speed. Certain dictionary queries run faster. • Re-gather after upgrade etc. • If they have never been gathered the impact can be significant, I have yet to personally see a major change as a result of re-gathering (I just do so once a year on “just in case” principles). ORA600 Ltd
Statistics on the internal tables owned by SYS and other internal Oracle users. SYS.OBJ$, SYS.TAB%, SYS.TS$, those tables. Dictionary Statistics • Are gathered as part of the default statistics gathering job. • In effect just like gathering schema statistics on the SYS, SYSTEM, OUTLN and other users. Support (and recommended {*} ) from V10. V9, keep with the RBO. • Can take several hours to gather on a database with tens of thousands of objects or more. • Can significantly aid parsing and other internal SQL, as well as DBA scripts running on the DBA views and also the underlying tables. ORA600 Ltd
Gather them regularly. Weekly to monthly. Dictionary Statistics • If you use the default automatic statistics gathering job, it is collecting dictionary statistics for you and is fine. • If you disable the automatic statistics gathering for your schema stats either: • Leaving it running for ONLY Dictionary statistics: • DBMS_PARAM.SET_PARAM(‘AUTOSTATS_TARGET’,’ORACLE’) • Organise regular dictionary stats gathering by your own methods. • Not gathering Dictionary Stats, especially on very large/complex database could lead to very poor dictionary/parse performance. • To be honest, with 10.1/10.2 at least, even gathering Dictionary Stats for systems with massive numbers of segments can fail to resolve some slow Dictionary performance ORA600 Ltd
That was the pre-amble. Getting the System, Fixed Object and Dictionary stats gathered gives you a solid base to tackle Object Statistics ORA600 Ltd
Used by the Cost Based Optimiser (CBO). Why are Stats Key to Performance? • CBO examines the SQL statement and works out the various ways in which it could satisfy the query (up to about 2,000 plans under 10). • For each step the CBO works out the cost, which is expected IO plus CPU (if turned on) and the cardinality, the number of records that step will return. • The cardinality is passed back to the next step and can be a multiplier of that step’s cost. • The costs are added up and Oracle then picks the plan with the lowest overall cost(*) and runs that plan (*) This is a slight lie, but the principal is true Ora 600 Ltd
The CBO is very logical, it uses just the figures it is presented with and simple calculations to make it’s decision. No magic involved. Why are Stats Key to Performance? • If those figures are wrong, ie the statistics are not representative then the costs calculated will be incorrect. • A small error can cascade up the code and cause a large difference to other steps and cause the plan to change. • With edge cases, a small difference often result in a different plan being chosen. • That different plan is often sub-optimal, sometimes seriously so, occasionally hundreds or thousands of times slower. • Cost/Cardinality being too low , often 1, is the most common cause of poor performance, due to prompting nested loop plans and incorrect driving tables Ora 600 Ltd
If the automated job is working for you, leave it alone. Automated Stats Gathering • If you turned off the automated job and “wrote your own” under V10 or before and now are on 11 – consider going back to the automated job. It is faster and more accurate. • If you turned off the automated job and did not do anything about your Dictionary stats, that is bad. • Under V11 you can control %stale and defaults like method_opt at table level. Consider doing that. • If you decided to “roll your own” I would advise you leave the automated job running but just for Oracle’s objects: • dbms_stats.set_global_prefs(autostats_target,’ORACLE’) ORA600 Ltd
Version 9, swap to DBMS_STATS and write your own Stats Gathering • Version 10, use the automated job (at least for dictionary stats) write your own version/exceptions for your tables. • V11.1 – Test, do not trust me, but I would still say go auto. • V11.2 – Use the automated job and if you must intervene, use default sample size so you get one pass NDV. • V11 – look at rolling up stats for partitions, subpartitions – but read up on it extensively. You have to ensure you gather all partitions or sub-partitions. • With all versions of Oracle you will have exceptions, usually overnight batch or partitioned tables. If you are the DBA, this is part of your job. ORA600 Ltd
One of the most demanding parts of generating object statistics is gathering Number of Distinct Values New Oracle 11 NDV • Oracle 11 introduced the single-pass NDV function. It scans the data once, uses much less memory, is faster and more accurate. • You have to use: • ESTIMATE_PERCENT=DBMS_STATS.AUTO_SAMPLE_SIZE • You cannot use BLOCK sampling. Amitpoddar via JL http://jonathanlewis.files.wordpress.com/2011/12/one-pass-distinct-sampling-presentation.pdf http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_stats/ https://blogs.oracle.com/optimizer/entry/improvement_of_auto_sampling_statistics_gathering_feature_in_oracle_database_11g ORA600 Ltd
The Automated Job may Choke ORA600 Ltd
select substr(operation,1,30) operation ,to_char(start_time,'DD-MON-YYYY HH24:MI:SS.FF') START_tme ,to_char(END_TIME,'DD-MON-YYYY HH24:MI:SS.FF') end_tme from sys.WRI$_OPTSTAT_OPR order by start_time desc OPERATION START_TME END_TME ------------------------------ ------------------------------ --------------------- gather_database_stats(auto) 02-JUN-2011 22:00:02.979206 02-JUN-2011 23:04:37 gather_database_stats(auto) 31-MAY-2011 06:00:02.811976 31-MAY-2011 06:56:21 gather_database_stats(auto) 30-MAY-2011 22:00:01.976379 30-MAY-2011 22:19:31 gather_database_stats(auto) 29-MAY-2011 22:00:01.416256 29-MAY-2011 23:36:14 gather_database_stats(auto) 28-MAY-2011 22:00:02.243542 29-MAY-2011 00:12:18 gather_database_stats(auto) 27-MAY-2011 22:00:03.588237 27-MAY-2011 23:14:24 gather_database_stats(auto) 26-MAY-2011 22:00:01.602425 26-MAY-2011 23:14:05 gather_dictionary_stats 24-MAY-2011 11:42:31.771667 24-MAY-2011 11:42:35 gather_dictionary_stats 24-MAY-2011 11:42:11.396340 24-MAY-2011 11:42:15 gather_database_stats(auto) 24-MAY-2011 06:00:02.905945 24-MAY-2011 06:20:38 gather_database_stats(auto) 23-MAY-2011 22:00:01.732964 23-MAY-2011 22:58:35 gather_database_stats(auto) 22-MAY-2011 22:00:01.421518 23-MAY-2011 06:00:05 gather_database_stats(auto) 21-MAY-2011 22:00:01.942455 22-MAY-2011 06:00:01 gather_database_stats(auto) 20-MAY-2011 22:00:03.066981 21-MAY-2011 06:00:01 gather_database_stats(auto) 19-MAY-2011 22:00:02.571718 19-MAY-2011 23:04:27 gather_database_stats(auto) 17-MAY-2011 06:00:01.462810 17-MAY-2011 08:02:34 gather_database_stats(auto) 16-MAY-2011 22:00:01.096761 16-MAY-2011 23:14:56 Ora 600 Ltd
Once the automated job chokes, it will continue to choke. Every night. This is because it tries the same thing each night. Fixing Choked Stats • The longer weekend run should sort things out – until it chokes. • Identify the table (Look for the gather statement interactively during the window, pull off the list of tables to gather options=list_stale, check for large objects with 10% difference...) • Do a manual gather with something like: block_sample=>true, estimate_percent=>0.1, degree=>8 method_opt=>for all columns size 1,cascade=> false, noinvalidate=>false, granualrity=>global • Once that is run you can afford to do a larger sample size and do the indexes. Do the PK first. • You probably need to lock the stats on the table and treat it as an exception. ORA600 Ltd
The stats say a segment is empty and it is not. Your WHERE predicates are out of range for the known column values. The stats say a segment holds than 10* less data than it does. The more orders of magnitude out, the worse. Histograms. (there and not need or need and not there. Ouch) The correlation between columns is not understood by the optimiser e.g. That values in tab1.x “line up” with those in tab2.y Edge cases that experts get excited about but 99% of us never see The Biggest “Wrong Stats” Issues The below are the worst “stats” causes of performance issues, in order, in my opinion based on experience Ora 600 Ltd
I can only go one what I have seen and, to a less turstworthy level (*ironic given the sources), what I have heard... Stats Issues are a VLDB thing? • OLTP systems with a need for the fastest absolute response time to small data queries are NOT troubled by object stats. • Edge cases that balance on correlation or swapping to nested loop from hash or using Cartesian join are specific to OLTP and a set of requirements where stats gathering are, well, redundant. • Where pain occurs is when a plan that hashes several segments together {often including partition exclusion} swaps to either a nested loop or Cartesian merge join that is not suitable • When it comes down to it, the plan for large volumes of data is right for all volumes of data. If it does it in 30 seconds inefficiently, doing it in 33 second efficiently is spot-on good enough. ORA600 Ltd
Single Values Expected Outside of range 100 10,000 values Oracle 10 with no Histograms Expect No Of values 0 100 200 300 400 100 10,000 values Oracle 10 with Histograms Expect No Of values 0 100 200 300 400 100 10,000 values Oracle 11 with no Histograms Expect No Of values 0 100 200 300 400 100 10,000 values Oracle 11 with Histograms Expect No Of values 0 100 200 300 400 Or 100 10,000 values Expect No Of values 0 100 200 300 400 ORA600 Ltd
Range Values Expected Outside of range 100 10,000 values Oracle 10 with no Histograms Expect No Of values 0 200 300 100 10,000 values Oracle 10 with Histograms Expect No Of values 0 200 300 100 10,000 values Oracle 11 with no Histograms Expect No Of values 0 200 300 100 10,000 values Oracle 11 with Histograms Expect No Of values 0 200 300 ORA600 Ltd
Oracle 10 deals with column values being “out of range” by decreasing cardinality over that range Histograms and Data Ranges • Eg if the low_value is 200 and the high_value is 300 and there are 1000 rows, that is 10 rows for 225. • For 350, it is 50% outside the range, so 10 rows is reduced by 50% to 5 rows. At value 400 it drops to 1 row. This is fine if you gather at 10%. • Histograms massively alter this “out of range” half-life. I have seen massive issues with dates. A table covering 5 years of data, with histograms on the date, can reduce a value only a week out of range to less then 1% of the average value. • This is a big issue on large tables that have stats gathered only occasionally (as not changed by a big percentage). ORA600 Ltd
This Out of Range issue is thrown into sharp relief with Partitions. Histograms and Partitions • If the partitions are daily or weekly and have histograms on them, SQL statement selecting data for the latest hour can become “out of range” sooner than you can believe. • Spot this by the cardinality being 1 rather than several hundred or thousand. • Can happen even without histograms, especially with daily partitions • Solutions? Use dynamic sampling on latest daily or weekly partitions (your in-house code either does not collect or deletes such stats), insert half-day stats or gather very aggresively. ORA600 Ltd
Replacing the Automatic Stats JobThe CBO is Complex.Stats are in effect a constantly evolving part of your code base.A simple approach will only work on a simple system.(And I really am Sorry!) ORA600 Ltd
Don’t Replacing the AUTO stats job • Tweak the current job – make it run at different times, alter the %stale and defaults at table levels. • Locks the stats on the tables that give you issues and treat these as your exceptions. • If you MUST replace the Auto job it will hurt: • Initial estimate for “simple” replacement will be a week or two. • If you plan and estimate it, that will come out as four weeks. • It will take you 2 months. • You will emulate most of what the AUTO job does. • Your solution will probably need to look at segment size, DBA_TAB_MODIFICATIONS and have several control table that allow you to specify METHOD_OPT, SAMPLE SIZE and GRANULARITY at table level... ORA600 Ltd
Auto Stats Replacement STATS_CTL Your Stats Job DBA_TAB_MODS DBA_SEGMENTS LIST_STALE/EMPTY OBJ_CTL OWNER TAB_NAME PART_NAME DFLT_STALE DFLT_SAMPLE DFLT_METHOD_OPT IDX_PCT COPY RULES SAVE_STATS BLOCK SAMPLE TAB THEN INDEX ORDER BY SIZE ASC ALL COLUMNS SIZE 1 PARALLEL STATS_LOG ORA600 Ltd
When you stats an ESTIMATE_PERCENT for a table gather statement, you can also state if it is block or row sample. It defaults to Row. Block or Row Sample • Row sample selects the percentage of rows scattered across all blocks. If you have eg 16k block size and over 100 rows per block, a 1%sample size will visit every block. Breaking news! There appears to be some issue with block sampling on version 10.2 You can use the SAMPLE command in normal SQL select statements and that is what Oracle does to gather stats. However, the BLOCK SAMPLE seems to vary the actual number of blocks it checks for a given sample size. I will investigate when I have time. • Block sample selects whole blocks, which greatly reduces the physical IO • Block sample size gives low column cardinalities and is susceptible to getting high-low values that are not close to the true edges. • The under-sampling of columns varies depending on spread, but my tests show that 5% block sample size is about as good as 0.8% row sample size, but still 10 times faster. It is fine for 99% of cases. • Oracle V11 up – the new NDV and speed of stats pushes me back towards AUTO SAMPLE SIZE ORA600 Ltd
You need to gather stats on each segment that needs stats. I strongly suggest any in-house code works segment-by-segment such that all tables, partitions and index segments are processed “as one” Write Something to Gather Stats on all Segments • You could use GATHER_SCHEMA_STATS and LIST the objects into an array and process. This gets object s Oracle would deem in need of stats: • STALE list those objects that have stats but have changed by 10% • EMPTY list those objects with no current stats • AUTO is supposed to list both but is buggy in 10.2.0.3 • Alternatively, run through all segments in the schemas you are interested in and used DBA_TAB_MODIFICATIONS directly. Slower (*) but much more control • I used the above on 10.1 but on current 10.2 system, the data dictionary is too slow. We decided to revert to GATHER_SCHEMA_STATS ORA600 Ltd
Oracle keeps all stats changed for 31 day, by default. Do not write code to roll back stats. • You can alter this with DBMS_STATS.ALTER_STATS_HISTORY_RETENTION, check it with DBMS_STATS.GET_STATS_HISTORY_RETENTION (usualy 31). • If the stats history is causing you issues use PURGE_STATS to get rid of them but, be warned, once gone they are gone. But then, how often do you RESTORE stats? • Unless you REALLY want to be able to identify sets of stats, just use Oracle’s in-built feature where it stores previous stats for 31 days and you can recover them. • This hidden feature can create a LOT of data, especially if you have lots of partitions with lots of column stats. It goes into SYSTEM TS. http://mwidlake.wordpress.com/2009/08/03/why-is-my-system-tablespace-so-big/ ORA600 Ltd
Most people are aware of the potential to have a user_statistics_table. This is a table you create that you can put stats into using EXPORT_XXXX_STATS and retrieve them with IMPORT_XXXX_STATS Rolling Back Statistics • You use the CREATE_STAT_TABLE table to create the table and give a STATID to sets of stats you wish to export and import. • One little Gotcha. The documentation is not clear, if you state a user statistics table in GATHER_XXXX_STATISTICS then they are NOT placed into the stats table, they are put in the dictionary and the OLD values are put in the stats table. • Rolling back stats works for system, fixed_object, database and object stats. I’ve tested it, it works – at least under 10.1 and 10.2. • Oracle 11 allows you to create Pending Stats which is very nice. It beats the manually version I developed for V10 in...ohh... 2005? ORA600 Ltd
Gathering System ,Fixed Object and Dictionary statistics are “system wide” changes. Do you have a suitable system to test this on? Rolling Back • All have equivalent DBMS_STATS.RESTORE_XXXX_STATISTICS commands that, when I tested on 10.2.0.3, worked correctly (including blanking stats that were previously null). • You could save the stats being replaced in your own statistics table and recover from there. The only advantage is it is easier to interrogate stats you saved into your own stats table. • Deleting System Stats reinstigates the default values seen after install. • Deleting Fixed Object stats deletes the stats. • Deleting Dictionary stats is not a very good idea as it works. • I said you could restore Dictionary stats but, even though I tested it, I am not going to promise that all stats will be the same after you restore... ORA600 Ltd
If you gather statistics on a Table or Table Partition, the default is to CASCADE to relevant index segments. Tables Cascade or Not? • The default will gather between 1000 and 2000 index blocks. I find this to be overkill. • By NOT cascading to index partitions but doing the indexes specifically and setting a sample size derived from the index size, you can use smaller samples sizes. • However, this is more code to write and test. It may be more pragmatic to just cascade, though each index segment may end up taking as long as the table segment to be gathered. • Consider global indexes and partitioned global indexes. ORA600 Ltd
Many of us still refer to “analyze the table” but of course we all now mean “gather statistics with DBMS_STATS”. Don’t we . DBMS_STATS Package • Don’t use ANALYZE any more, especially not on production systems. • DBMS_STATS gathers better statistics, including histograms, and cascade down to partitions and sub-partitions as needed. • Can be run for specific segments (table, index, partition thereof etc), for a schema or for the database and can be set to only gather “stale” objects. • Can use Parallel to make up for the slower running, can set sample size by row and block (see more later) • In these slides, any procedure or function is in DBMS_STATS unless otherwise specified. ORA600 Ltd
To get the best performance, you want the stats to be as up-to-date and accurate as possible. Stability and Performance Dichotomy Please note, the actual result of the data processing does not change, the functionality is preserved, but the time taken can change, the idea being it changes for the better • When the stats change, it is like having a miniture code release. The processing of the application can change – after all, that is the idea. • How many people here would allow a code release on their business critical systems at any time? • Most people are not actually concerned with good performance, they are concerned with poor performance. • Code can suddenly start running slowly “over night” when stats are gathered. Just one very bad statement can actually make the system unusable. Ora 600 Ltd
As an example, stats showing a table or partition is empty will cause the CBO to think it is a cost of 1 to scan and only one record will be found. Nested loops to access it and Cartesian Merge plans will occur. Why are Bad Stats worse than No Stats? • More later, but stats saying there is no data later than a month ago will cause the CBO to again assume there is as little as one record to find. • If there are no stats, Oracle will use dynamic sampling or use defaults, which are usually better than bad stats • Spotting Bad Stats is not as easy as spotting missing Stats, especially as Oracle stores high/low values and histograms in internal formats. • Old stats (a specific type of bad stats) can cause a plan to flip to another plan without notice, with NO CHANGE. ORA600 Ltd
Stated low percentage stats gathering for large objects – use BLOCK SAMPLING and do tables and indexes specifically. Intervention methods for stats • Set statistics manually or COPY_TABLE_STATS. The tricky part is the column stats and any histograms you have to have. • Delete and lock stats and allow Dynamic Sampling to occur. • Gather stats at lowest level (sub-partition or Partition) and allow them to be calculated at higher levels via Incremental stats. Made possible via synopses (but needs care to implement) • The overall aim is always the same. You do not need accurate stats, you need stats that are good enough to give execution plans that will work for large data volumes ORA600 Ltd
Most sites have stats gathering over night, even if they replace the auto stats gathering job. This may well not be the best time, When to Gather • You may be better off collecting stats in the late evening or early morning. You can use PARALLEL to get the job done more quickly. • You are very unlikely to benefit from gathering all stats in one window. • If you are processing date into a table, gather the stats at the right time(s). This is probably after the load, or it could be several times in the process. • So long as only specific gathering is done, there is great benefit to be had gathering stats as your batch process proceeds. ORA600 Ltd
I’m soooooo tired of this conversation and I have had it so many times... Batch processing means you need to think about stats When to Gather-Batch Processing • You have a table that is empty. You stuff it full of data. You use it to load data into your live database. Then you may or may not truncate it. Ask yourself, when is the volume of data significant and when do you gather stats. • If it is a global temporary table you may need to set some stats. • If I was to be given a pound for each time I had seen that the batch processing had no concept of stats gathering I would have £17.50. • So long as only specific gathering is done, there is great benefit to be had gathering stats as your batch process proceeds. ORA600 Ltd