1 / 33

ASE126 Opening Your Maintenance Window

ASE126 Opening Your Maintenance Window. Kevin Sherlock – Staff Systems Analyst, Dex Media Inc. kevin.sherlock@teamsybase.com Eric Miner - Senior Engineer, Sybase Inc. eric.miner@sybase.com August 2003. Making Update Statistics Run Faster.

keely
Download Presentation

ASE126 Opening Your Maintenance Window

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASE126 Opening Your Maintenance Window Kevin Sherlock – Staff Systems Analyst, Dex Media Inc. kevin.sherlock@teamsybase.com Eric Miner - Senior Engineer, Sybase Inc. eric.miner@sybase.com August 2003

  2. Making Update Statistics Run Faster Ways to make it more efficient when you have to run it • From tuning the server to using a new feature • We’ll talk about ASE 12.5.0.3’s new feature ‘Sampling for Update Statistics” • We’ll show you some simple things you can do now to help speed up update statistics • All of these are designed to help make your statistics maintenance easier and more efficient

  3. Shrinking Maintenance Windows My maintenance window isn’t big enough to run update statistics! • As datasets have grown so has the time it takes to run update statistics • In some cases the maintenance window simply isn’t big enough to run update statistics whether it’s needed or not • Your statistics maintenance needs to be as efficient as possible to make the best use of your window • You want to keep your statistics up to date but it’s next to impossible to do in some cases • How can you speed up update statistics?

  4. What are the ‘Statistics’? And why do we need them? • The optimizer is cost based – it needs to know about tables. Indexes and the data • The optimizer makes its decisions based on the estimated cost of a given access method • The statistics are used to estimate various selectivity values • The selectivity values are used to estimate the cost of a given access method • The statistics are the optimizer’s only view of your objects and data • Without statistics the optimizer can only make guesses about the objects and data

  5. There Are Two Types of Statistics Both are used by the optimizer • Table/Index level - describes a table and its index(es) • Data &Leaf page count, Data &Leaf row count, Empty data & leaf page count, Forwarded row count, Deleted row count, OAM & allocation page, First extent leaf pages , Data/Leaf row size, Index height, Data/Index page cluster ratio, Data Row cluster ratio • Some values are updated dynamically as DML occurs • Page & row counts, deleted rows, forwarded rows, cluster ratios • The other values are static and updated by update statistics • Space Utilization, Large I/O Efficiency are NOT used by optimizer. They’re For your information only – helps track fragmentation • Cannot be written directly or with optdiag – quickly overwritten • Stored in systabstats Column level - describes the data to the optimizer • Histogram (distribution), density values, default selectivity values • Static, need to be updated or written directly • Stored in sysstatistics • Updated via update statistics, inputting an optdiag file or writing directly • Stored in sysstatistics

  6. Why Maintain the Statistics? Do I have To Keep the Column Level Statistics Up to Date? • In most cases up to date statistics are needed for the optimizer to produce the most efficient query plans • In most cases update statistics if – • The distribution of data has changed, usually due to a significant amount of DML occurring • There have been significant changes to row and page counts • It’s a waste of time to update statistics on static data • Myth !! :Update statistics guarantees good performance • You may be able to ‘open’ your maintenance window by not running update statistics as often as you have been • Some environments never need update statistics to be run, some need it every 5 minutes. • Find out where you fall between these extremes • Test to see if you’re running update statistics too often • Frequency of update statistics runs can become habit or custom • Question whether or not you really need to run update statistics as often as you do • Test changing your frequency of running update statistics

  7. Sampling for Update Statistics Dramatically decreases the time for update statistics to run • What is Sampling for Update Statistics? • A new 12.5.0.3 option of update statistics that reads random pages for statistics gathering • Sampling decreases the I/O required to gather the statistics by not reading all pages • Percentage of pages to sample can be specified • Why Sampling for Update Statistics? • The speed of update statistics has been an issue for a very long time • Making it run faster has been a top Customer feature request for years • Opening the statistics maintenance window lowers the cost of using ASE • Let’s be honest….update statistics can be a real pain to run

  8. Sampling for Update Statistics cont. Sampling Time Test - Dramatically decreases the time for update statistics to run Integer Column10,556,400 Rows 105,564 Pages

  9. Sampling for Update Statistics cont. Dramatically decreases the time for update statistics to run • How does it work? • The percentage of pages to read is specified in the sampling extension of the update statistics command – update statistics tab_name (col_name) with sampling = 10 percent • The specified percentage of pages will be randomly read • Sampling rates are specified in percentages from 1 to 100 • Reading random pages is cheaper in I/O than reading random rows, but it’s not a ‘pure’ sampling • Column values from sampled pages are still read into a worktable, the worktable is sorted and the statistics gathered from there • Since the number of pages is lower with sampling the size of the worktable will be smaller and less tempdb space will be used • The column’s histogram will be based on the values that are sampled

  10. Sampling for Update Statistics cont. How Does It Work cont. – The Effects of Sampling with Update Statistics Syntax update statistics table_name [index_name] with sampling = N percent • Will create/update statistics on the leading column(s) of all indexes on the table or the specified index • Sampling will not be used, the specified sampling rate will be ignored • Index pages are not currently read by sampling – this is at the top of the enhancements list • Index has 78780 leaf pages Physical Reads Logical Reads Pages Written -------------- ------------- -------------- 14 79021 24 • Because no sort is needed there is a major savings when scanning the index for the leading column

  11. Sampling for Update Statistics cont. How Does It Work cont. – The Effects of Sampling with Update Statistics Syntax cont. update statistics table_name (column_name) with sampling = N percent • Will create/update statistics on the specified column using the specified sampling rate update statistics big_id(b) • Full scan update statistics – SQL Server elapsed time: 939683 ms. Physical Reads Logical Reads Pages Written --------------- ------------- ------------- 281243 11017457 119964 • Using a 10% sampling rate – SQL Server elapsed time: 71483 ms. Physical Reads Logical Reads Pages Written -------------- ------------- -------------- 23065 1099327 10562

  12. Sampling for Update Statistics cont. How Does It Work cont. – The Effects of Sampling with Update Statistics Syntax cont. update index statistics table_name[index_name] with sampling=N percent • Sampling will not be used on the leading column(s) of index(es) but will be used on all innercolumns of index(es) • update index statistics big_id abc_nc • Full scan update statistics • SQL Server elapsed time: 1922336 ms. • Physical Reads Logical Reads Pages Written • ------------- ------------ ------------ • 661316 22134151 187666 • Using a 10% sampling rate – • SQL Server elapsed time: 248860 ms. • Physical Reads Logical Reads Pages Written • -------------- ------------- ------------- • 149096 2297970 27660

  13. Sampling for Update Statistics cont. If you must use ‘update all statistics’ sampling will make it less painful update all statistics table_name with sampling = N percent • Will use sampling to create/update statistics on all columns in the table that are not leading column of an index • If you find that you need statistics on all columns of a table sampling will make update all statistics much faster • Full scan update all statistics SQL Server elapsed time: 1390596 ms.(that’s 23.2 minutes) Physical Reads Logical Reads Pages Written -------------- ------------- ------------- 630335 11373699 127653 • Using a 10% sampling rate – SQL Server elapsed time: 531390 ms. (that’s 8.86 minutes) Physical Reads Logical Reads Pages Written ------------- ------------- ------------- 373876 1457379 13812

  14. Sampling and Tempdb Usage Smaller worktables needed for sorts in tempdb • Using sampling with update statistics will decrease the space needed in tempdb • Applies to columns that are not leading columns of an index or when update statistics is run on a specified column • Values from columns are read into a worktable, sorted and read again to gather the statistics • With a full scan update statistics the worktable used for the sort can become very large and cause tempdb to run out of space • Full scan - Physical Reads Logical Reads Pages Written --------------- ------------- ------------- 281243 11017457 119964 • Using a 10% sampling rate – Physical Reads Logical Reads Pages Written -------------- ------------- ------------- 23065 1099327 10562

  15. What To Watch For When Using Sampling As with everything there are some trade offs • Sampling, by it’s nature does not read all values in the set • Since not all values are read there will differences between a sampled histogram and one obtain with a full scan • It’s a balance of time saved running update statistics vs. accuracy of the statistics • The boundary values and cell weights will vary from those in a full scan histogram • Unique column – Full scan 4 0.05263158 <= 166680 5 0.05263158 <= 222240 6 0.05263158 <= 277800 7 0.05263158 <= 333360 10% Sampling 4 0.05261801 <= 165378 5 0.05261801 <= 223117 6 0.05261801 <= 277841 7 0.05261801 <= 332565 • The lower the sampling rate the greater the variations from the full scan

  16. What To Watch For When Using Sampling cont. Min and Max values not in the histogram • Column’s min and max values may not be in the sampled set Full scan Step Weight Value 1 0.00000000 <= 0 2 0.05263158 <= 55560 19 0.05263158 <= 1000080 20 0.05263158 <= 1055640 10% Sampling Step Weight Value 1 0.00000000 <= 5390 2 0.05263158 <= 55286 19 0.05263158 <= 997766 20 0.05263158 <= 1053206 • If SARG values fall outside of the histogram the optimizer makes assumptions about selectivity • If you query near either end of the histogram check it from time to time to monitor and effects from sampling

  17. What To Watch For When Using Sampling cont. Other things to watch for • Density values are not affected by sampling • If there are existing statistics on the column the density values will not be overwritten, if there are no statistics a default value of 0.10 will be used • Most of your columns have statistics, if you put statistics on a column that did previously have them use a full scan first • This will be changed soon • Different cell types may appear • As the sampling rate decreases it is possible that Frequency count and/or Range cells may appear where they didn’t exist prior to sampling • Highly duplicated values are more likely to be seen in the histogram • The same pages will be re-sampled if the dataset is static and the same sampling rate used • Don’t be surprised

  18. What Sampling Rate Should You Use? A rule of thumb based on tests • “Trial and error” testing • You’ll need to test to determine what sampling rate is most efficient for your columns • Check that there are no changes to your query performance • It’s a trade off between faster update statistics and perfectly accurate statistics • Try starting with a sampling rate of between 20% and 10% • Use optdiag to monitor changes to the histograms • Use showplan, traces 302/310 to check for changes to query plans if you see any effects on performance

  19. Upcoming Enhancements We need to add improvements • Enhancements will be made to sampling • Will read index pages • Will ‘scale’ the density values based on the sampled set of values • Will enable optdiag and optimizer analysis output so that they record that sampling was used and what the last sampling rate was

  20. Server Tuning Leverage dynamic configuration options! • Update statistics is disk io and sort intensive • Configuration parameters to consider adjusting • number of sort buffers – controls how many buffers to reserve in an object’s cache for sorting; tuning this can depend on the size of the column(s) being sorted; sort runs should be minimized; • formula: • Number of large I/O buffers – controls the number of pages to be used during index creation • I/O polling process count – For disk IO bound operations, tune low for added throughput • Runable process search count – Tuning to a very high number, or zero keeps engines from yielding to OS, promoting better throughput for task switching operations • Disable disk mirroring – Keeps engines from having to request and be granted device semaphores. This is unnecessary overhead which can be avoided if you don’t use ASE mirroring (this option is enabled by default in 12.5.0.3) • Large IO pools • Create the largest 8-page (16k, 32k, 64k, 128k) IO pools as possible - Update statistics uses the largest IO pools available in both the object’s cache, and tempdb/default cache. • ASE 12.5.0.3 LSA – Worktables are created using Large Scale Allocation. Only if large IO pool is defined. • Procedure cache usage • Used for sorting – Increasing sort memory requirements can cause procedures to age-out of cache resulting in procedure cache misses.

  21. Tips For Opening Your Maintenance Window cont. Tips, Techniques and “Best Practices” • Avoid “Duplicate” statistics gathering • Rebuilding indexes • “create index” also collects statistics for index and leading column (some bugs for sorted_data) • Update statistics <tablename> <indexname> is not necessary • Number of histogram steps for column survives “drop index” • Two or more indexes which have common columns • Leverage indexes which have leading column - • clustered index i1 on authors(auid) • nonclustered index i2 on authors(au_lname,auid) • nonclustered index i3 on authors(auid, au_fname) • Goal - Column stats for “auid” • DO: update statistics authors i3 /* (not i1) */ • DON’T: update statistics authors(auid) • DON’T: update index statistics authors i2 • Also, when given a choice, use non-clustered indexes to gather column stats for a leading column rather than clustered index

  22. Tips For Opening Your Maintenance Window cont. Know your choices of update statistics syntax and avoid overkill • Which update statistics syntax do you really need to run? • update statistics table_name [index_name] will update statistics on the leading column of all indexes on the table or on the specified index • This is the original update statistics syntax • update index statistics table_name [index_name] will update statistics on all columns that are in indexes on the table or the specified index • More expensive in terms of maintenance, but is likely to result in more efficient query plans and better performance – it’s a balance you need to test • update statistics table_name (column_name) will update statistics on the specified column only, will not read index pages • update all statistics table_name – will update statistics on all columns in the table, and will update partitions statistics. • This can be a real maintenance killer! Very rarely do you need statistics on all columns of a table • Test to determine which syntax gives you the most efficient query plans while giving you a reasonable maintenance window

  23. Tips For Opening Your Maintenance Window cont. Know your choices of update statistics syntax and avoid overkill cont. • Sequential operations vs. Concurrent operations • Replace single, multiple operation commands with multiple connections and single operation commands • Example 1: • c1> update index statistics authors • can be converted to • c1> update statistics authors ix1 • c2> update statistics authors ix2 • c3> update statistics authors(au_lname) • Example 2: • c1> update statistics authors • can be converted to • c1> update statistics authors ix1 • c2> update statistics authors ix2 • c3> update statistics authors ix3 … • Multiple concurrent commands leverage multiple engines, multiple devices. Higher throughput.

  24. Tips For Opening Your Maintenance Window cont. “Alternative” ways to gather/create statistics • Use optdiag to get updated statistics from a copy of your production data • Use replication or dump/load to create duplicate data • Update statistics on the copy and get an optdiag output file from there • Input the optdiag file into the production server • Moving the data may take a while - use “day-old” statistics if necessary • Write your own statistics • You know your data best • Don’t need “update statistics” to tell you about data distribution • Unique, evenly distributed and sequential monotonic columns allow the column statistics to be derived • Make sure to know your data • Table/Index level statistics will not be updated

  25. Tips For Opening Your Maintenance Window cont. “Alternative” ways to gather/create statistics • CIS/Proxy Table “Tricks” • Addresses problem using “update statistics <tablename> (column_list)”, or “update index statistics”, or “update all statistics” • For minor column attributes, these use the datapages (tablescan) to build the worktable for gathering statistics! • Extra work needed to create/maintain views, and proxy tables • Requires the use of optdiag “input mode” • This technique can save LOTS of time • Can spread IO around to other segments • How using CIS and Proxy tables works • Create a view on minor column attributes for each non-clustered index which forces the use of the index in the view definition • Create a proxy table on each view • Run “update statistics <proxytable> (column)” instead • Extract, and input column level stats into base table using optdiag • Saves having to scan the base table for each minor attribute!

  26. Tips For Opening Your Maintenance Window cont. “Alternative” ways to gather/create statistics • CIS/Proxy Table Example i1m_cl clustered, unique located on default (unique2, onePercent, twenty) i1m_nc_c_int nonclustered, unique located on i1m_nc_c_int (skew7, twenty, userid) index_name size reserved unused -------------------- ---------- ---------- ---------- i1m_cl 1920 KB 224186 KB 42 KB i1m_nc_c_int 19094 KB 19120 KB 26 KB name rowtotal reserved data index_size unused -------------------- ----------- --------------- --------------- --------------- --------------- oneMrows 1000000 336296 KB 222224 KB 113930 KB 142 KB Isql> Create view i1m_nc_c_int_minor(twenty, userid) As SELECT twenty, userid FROM oneMrows (index i1m_nc_c_int) Create existing table i1m_nc_c_int_proxy(twenty int, userid int) External table at ‘loopback.userdb.dbo.i1m_nc_c_int_minor’ Update statistics i1m_nc_c_int_proxy (twenty) Update statistics i1m_nc_c_int_proxy (userid) OS> optdiag statistics userdb.dbo.i1m_nc_c_int_proxy -U… -S… >> oneMrows.input edit oneMrows.input optdiag statistics -I oneMrows.input -U… -S…

  27. Tips For Opening Your Maintenance Window cont. “Alternative” ways to gather/create statistics • CIS/Proxy Table Results Normal Runtime = 1:11 CIS/Proxy Table Runtime = 0:11

  28. Deleting or Limiting Statistics Sometimes no statistics are better than having them • Choosing not to have statistics on a column • Much the same as deleting statistics, but you’ve chosen not to add them to the column in the first place or have removed them • A decision to allow the hard-coded default values to be used • No maintenance required • Requires some planning and testing though • Without statistics the optimizer has to use a small set of hard-coded values for its selectivity estimate • If statistics give the optimizer an ‘ugly’ picture of the distribution the hard-coded selectivity values may result in a lower selectivity value • Statistics on a gender column ~ 0.50 selectivity • Default values will be better - 0.10, 0.25 or 0.33 depending on the SARG operator • Make sure maintenance doesn’t add the statistics again • Make sure to remove update statistics for these columns from your scripts

  29. Tempdb Configuration and Considerations Adjusting tempdb for update statistics sorting • Worktables will be used when gathering statistics on some columns • For leading columns of indexes there is no need to sort since values are already in sorted order – only an index scan is done • For any other column the values must be read (from the table) into a worktable (in tempdb), sorted and read again for statistics gathering • The size of the worktable will depend on the number of rows, the size of the rows, and if sampling is used, the sampling rate used • You will usually see heavy tempdb usage during update index statistics, or when you run a series of update statistics table_name (column_name) • It is the worktable creation, sorting and reading what makes update all statistics painfully slow on larger tables

  30. Tempdb Configuration and Considerations cont. Leverage dynamic configuration options! • Tempdb Configuration • Stripe on multiple physical disks • Stripe size a multiple of logical extent size – Pay attention to OS specific device managers to tune stripe sizes. • Supports concurrent sorting and worktable activity • System segment focused • Place “system” segment on fastest/dedicated device • “system” segment is where worktables for statistics gathering get created • Use multiple devices to support parallel sort • Tempdb cache • Optimize for write intensive tasks • Largest possible large I/O pools • Cache status = “HK ignore cache” • Leverage “lazy write” feature in ASE 12.5.0.3

  31. The Sybase Customer newsgroups http://support.sybase.com/newsgroups The Sybase list server SYBASE-L@LISTSERV.UCSB.EDU The external Sybase FAQ http://www.isug.com/Sybase_FAQ/ Technical Documents at – http://www.sybase.com/support/techdocs Join the ISUG, ISUG Technical Journal, feature requests http://www.isug.com Other utilities, scripts, stored procedures http://teamsybase.com/kevin.sherlock Where To Get More Information

  32. Sybase Developer Network (SDN) Additional Resources for Developers/DBAs • Single point of access to developer software, services, and up-to-date technical information: • White papers and documentation • Collaboration with other developers and Sybase engineers • Code samples and beta programs • Technical recordings • Free software • Join today: www.sybase.com/developer

  33. SDN Presents CodeXchange Share ASE Scripts and Tools • New SDN feature enables community collaboration • Download tools created by Sybase • Leverage contributions of others to help administer and monitor your servers • Contribute your own code or start your own collaborative project with input from other ASE experts • Any SDN member can participate • Log in using your MySybase account via SDN • Join the collaboration already underway • http://ase.codexchange.sybase.com or via SDN at www.sybase.com/developer

More Related