1 / 65

All (*) you should know about waits

Learn everything you need to know about server waits, including how to interpret wait types, task states, and using DMVs to analyze wait statistics.

moserb
Download Presentation

All (*) you should know about waits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. All(*) you should know about waits Damian Widera

  2. Damian Widera • PM & Technical Lead • MCT since 2005 • MVP since 2009 • @DamianWidera • Fb: damian.widera.10 • damian.widera@live.com • http://sqlblog.com/blogs/damian_widera/ http://1drv.ms/1qM1cGI

  3. Whyis the serverwaiting…. • The serverrunsslowerthan „previously” • Everythingisslow and thatmust be a database problem • Verycommonansweralso in Poland – „we have not changedanything on the server”

  4. Where to start…. • Hardware? • I/O? • Indices / indexes? • Applications?

  5. Wirtualizacja…

  6. Where to start, cont. • Ask the SQL Server!!!!  • Waitstats • DMV • Perfmon • Query plan analysis • Tuning -> Waits&Queues • The Problem – how to interpret the data

  7. Data interpretation • Three people come to to a doctor • My kneehurts….(1) • Twisted ankle • My kneehurts ….(2) • Overweight? • My kneehurts ….(3) • It isjust a knee

  8. What is a Wait Type? – following Bob Ward Not the best of docs • SQL Server developers created this to help to find bottlenecks • In a galaxy, far, far, away we had locks, I/O and network • But as time has moved on… we went a bit overboard • The name of the type is up to the developer 485 in SQL Server 2008 771 in SQL Server 2014

  9. Data interpretation, cont.

  10. SQL Server Execution Model

  11. Task state • SUSPENDED = the task is in the Waiter List („I want to do my task however the resource I need is not available”) - • sys.dm_os_waiting_tasks • RUNNABLE = the task in in the Runnable Queue („I have my resource and need the processor to run”) • sys.dm_exec_requests • RUNNING – the task is actually in the processor • sys.dm_exec_requests

  12. Many processors?

  13. SQL Server Execution Model – in practice SQL Server 2005 Waits and Queues , Tony Davidson

  14. SQL Server Execution Model – in practice SQL Server 2005 Waits and Queues , Tony Davidson

  15. SQL Server Execution Model – in practice SQL Server 2005 Waits and Queues , Tony Davidson

  16. SQL Server Execution Model – Special cases (example) • Resource Governor • NOLOCK hint • Only one task is running– nothing is in the Runnable Queue

  17. SQL Server Execution Model – Special cases • Resource Governor – the DBA can define multiple workload groups that share the same resource pool • A workload group = bucket of connections • A resource pool = a set of CPU and memory limits • Multiple workload groups for a resource pool can be assigned relative priorities The default priority is medium • high, medium, and low

  18. SQL Server Execution Model – Special cases • The relative priorities change how the Runnable Queue works • High to medium to low equates to 9 to 3 to 1 in terms of the priority of the threads that are permitted to execute • For ONE therad of Low-priority on the runnable queue NINE high-priority threads will be allowed to run first

  19. SQL Server Execution Model – Special cases • NOLOCK hint • What in that case? Do we need to wait? • Trace flag 1200 could help us in understanding the how the locking actually works

  20. DEMO TF 1200

  21. SQL Server Execution Model – Special cases • Only one task is running– nothing is in the Runnable Queue • IN ANY CASE: The thread will be running until it reaches a quantum which is fixed to 4 ms • The thread’s state will change from RUNNING to RUNNABLE and to RUNNING again SPID 51

  22. Where do you see wait type? Historical stats Live state Legacy Tracing In the tools Wait Statistics Counter

  23. sys.dm_os_waiting_tasks • We need to take a look to this DMV to see what is the server waiting for NOW • This DMV should be correlated with other DMV’s to display human readable information sys.dm_exec_sessions sys.dm_exec_requests sys.dm_exec_sql_text sys.dm_exec_query_plan

  24. DEMO sys.dm_os_waiting_tasks

  25. sys.dm_os_waits_stats • We need to analyze the DMV to see the trend of waits stats • How to make it? • sys.dm_os_waits_stats + GETDATE() • Store in a table • SQL Server Agent job – make a snapshot every XXX minutes • SQL Server Agent job – clear or archive old snapshots

  26. sys.dm_os_waits_stats • If the DBA sees something on the list (for example LCK_M_X) that does not mean the server has a problem • The DBA has to review the list and leave only the most dominant waits • Some of the wait types are not important and the DBA should not care

  27. sys.dm_os_waits_stats • This DMV aggregates the information • Since the SQL Server service was started • Since the last DMV reset • It is obvious that the view is cleared out when the SQL Service is restarted however in production environment we don’t want to do that • Also other DMV’s are clreared  • Reset only that particular DMV using the DBCC SQLPERF command • DBCC SQLPERF(’sys.dm_os_waits_stats’,CLEAR)

  28. DEMO sys.dm_os_waits_stats

  29. How to monitor only one particular session? • Extended Events • A modern lightweight mechanism to trace the SQL Server activity • Please configure it properly! • What to observe: • sqlos.wait_info (Begin -> End for every WAIT) • sqlos.wait_info_external (for preemptive waits -> information from OS)

  30. DEMO XE – how to monitor one session

  31. Waits stats across the world

  32. Waits stats across the world • Paul Randal, 2010, 1800 instances http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/

  33. Waits stats across the world • Paul Randal, 2014, 1708 instances http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/

  34. Waits stats across the world • CXPACKET is still the top wait type, which is unsurprising • OLEDB has increased to being the top wait type roughly 17% of the time compared to roughly 4% in 2010 • WRITELOG has increased to being the top wait 10% of the time compared with 6% in 2010 • ASYNC_NETWORK_IO has decreased to being the top wait 8% of the time compared with 15% in 2010 • PAGEIOLATCH_XX has decreased to being the top wait 7% of the time compared with 18% in 2010 http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/

  35. Some waits may not be a bottleneck Should be called “not waiting” BOL calls these Queue Waits Normal for SQL CLR

  36. CXPACKET – don’t panic! • It is perfectly normal to see it when a task is done on more than one processor. It is actually good • If the DBA observes that the CXPACKET is rapidly growing that is an indicator that it might be a problem • DO NOT FOLLOW THE DUMB RECOMENDATION – which is „SET THE MAXDOP to 1 ON THE INSTANCE LEVEL”

  37. CXPACKET – not a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4

  38. CXPACKET – not a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4

  39. CXPACKET – not a problem here Table CXPACKET Thread 0 Thread 1 Thread 2 Thread 3 Thread 4

  40. CXPACKET – a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4

  41. CXPACKET – a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4

  42. CXPACKET – a problem here Table Thread 0 Thread 1 CXPACKET Thread 2 Thread 3 Thread 4

  43. CXPACKET • Check if you see the PAGEIOLATCH_SH, too. That indicates table scan • Table scan • Not have a nonclustered index • Query plan might be not optimal • Statisctics are skewed • Recommendation • Take a look into statistics • Verify the indexing startegy • MAXDOP …

  44. MAXDOP – some notes • I have found some „recomendations” in the internet. Do not follow them • Set MAXDOP to 1 if the CXPACKET is a most important WAIT • Set MAXDOP to 1 if you have OLTP system • Set MAXDOP to the half of number of physical processors • Set MAXDOP to number of NUMA nodes • Facts • There is no one good startegy of setting the MAXDOP • There are good recomendations what to do

  45. MAXDOP – recomendations • If you have an OLTP system – try to set the MAXDOP to 1. You can try to identify the worst performing queries and set the MAXDOP to 1 for those queries. In some cases setting the MAXDOP to 1 on the instance level might help (test it!) • If the system has mixed characteristrics like reporting – oltp – analysis do not set MAXDOP=1 on instance level ever. Identify the worst performing queries and try to change MAXDOP. Use Resource Governor and set the MAXDOP limit – this cannot be overriden • Check other possibilities - for example statistics might be outdated • Consider change the Cost Threshold for Parallelism to higher value

  46. DEMO – DO IT ON YOUR OWN – CODE INCLUDED CXPACKET

  47. OLEDB • The OLE DB mechanism is being used however that does not mean that a linked server is used. It is of course possible (so take a look on that, too) but review also other possibilities • The DBCC CHECKDB and related commands use OLE DB internally • Many DMVs use OLE DB internally so it could be a third-party monitoring tool that is repeatedly calling DMVs

  48. WRITELOG • Waiting for a transaction log block buffer to flush to disk • Do not assume that the transaction log file I/O system has a problem (although this is often the case) • Do not create additional transaction log files

  49. WRITELOG • Correlate WRITELOG wait time with sys.dm_io_virtual_file_stats • Lookfor LOGBUFFER waits, showing internal contention for log buffers • Look at average size of transactions. • Look at average disk write queue length for log drive. Remember that there is a hardcoded limit of 32 outsanding transaction log writes for a single database • If yes – it is a REAL problem. • Investigate whether frequent page splits are occurring

  50. WRITELOG • Use better, faster drives like SSD • Make the transaction take longer time or try to lower down the amount of transactions • Take a look at non used indexes – the SQL Server must maintain them • Check the FILLFACTOR – maybe there is too much page splits • Last resort – new database 

More Related