650 likes | 667 Views
Learn everything you need to know about server waits, including how to interpret wait types, task states, and using DMVs to analyze wait statistics.
E N D
All(*) you should know about waits Damian Widera
Damian Widera • PM & Technical Lead • MCT since 2005 • MVP since 2009 • @DamianWidera • Fb: damian.widera.10 • damian.widera@live.com • http://sqlblog.com/blogs/damian_widera/ http://1drv.ms/1qM1cGI
Whyis the serverwaiting…. • The serverrunsslowerthan „previously” • Everythingisslow and thatmust be a database problem • Verycommonansweralso in Poland – „we have not changedanything on the server”
Where to start…. • Hardware? • I/O? • Indices / indexes? • Applications?
Where to start, cont. • Ask the SQL Server!!!! • Waitstats • DMV • Perfmon • Query plan analysis • Tuning -> Waits&Queues • The Problem – how to interpret the data
Data interpretation • Three people come to to a doctor • My kneehurts….(1) • Twisted ankle • My kneehurts ….(2) • Overweight? • My kneehurts ….(3) • It isjust a knee
What is a Wait Type? – following Bob Ward Not the best of docs • SQL Server developers created this to help to find bottlenecks • In a galaxy, far, far, away we had locks, I/O and network • But as time has moved on… we went a bit overboard • The name of the type is up to the developer 485 in SQL Server 2008 771 in SQL Server 2014
Task state • SUSPENDED = the task is in the Waiter List („I want to do my task however the resource I need is not available”) - • sys.dm_os_waiting_tasks • RUNNABLE = the task in in the Runnable Queue („I have my resource and need the processor to run”) • sys.dm_exec_requests • RUNNING – the task is actually in the processor • sys.dm_exec_requests
SQL Server Execution Model – in practice SQL Server 2005 Waits and Queues , Tony Davidson
SQL Server Execution Model – in practice SQL Server 2005 Waits and Queues , Tony Davidson
SQL Server Execution Model – in practice SQL Server 2005 Waits and Queues , Tony Davidson
SQL Server Execution Model – Special cases (example) • Resource Governor • NOLOCK hint • Only one task is running– nothing is in the Runnable Queue
SQL Server Execution Model – Special cases • Resource Governor – the DBA can define multiple workload groups that share the same resource pool • A workload group = bucket of connections • A resource pool = a set of CPU and memory limits • Multiple workload groups for a resource pool can be assigned relative priorities The default priority is medium • high, medium, and low
SQL Server Execution Model – Special cases • The relative priorities change how the Runnable Queue works • High to medium to low equates to 9 to 3 to 1 in terms of the priority of the threads that are permitted to execute • For ONE therad of Low-priority on the runnable queue NINE high-priority threads will be allowed to run first
SQL Server Execution Model – Special cases • NOLOCK hint • What in that case? Do we need to wait? • Trace flag 1200 could help us in understanding the how the locking actually works
DEMO TF 1200
SQL Server Execution Model – Special cases • Only one task is running– nothing is in the Runnable Queue • IN ANY CASE: The thread will be running until it reaches a quantum which is fixed to 4 ms • The thread’s state will change from RUNNING to RUNNABLE and to RUNNING again SPID 51
Where do you see wait type? Historical stats Live state Legacy Tracing In the tools Wait Statistics Counter
sys.dm_os_waiting_tasks • We need to take a look to this DMV to see what is the server waiting for NOW • This DMV should be correlated with other DMV’s to display human readable information sys.dm_exec_sessions sys.dm_exec_requests sys.dm_exec_sql_text sys.dm_exec_query_plan
DEMO sys.dm_os_waiting_tasks
sys.dm_os_waits_stats • We need to analyze the DMV to see the trend of waits stats • How to make it? • sys.dm_os_waits_stats + GETDATE() • Store in a table • SQL Server Agent job – make a snapshot every XXX minutes • SQL Server Agent job – clear or archive old snapshots
sys.dm_os_waits_stats • If the DBA sees something on the list (for example LCK_M_X) that does not mean the server has a problem • The DBA has to review the list and leave only the most dominant waits • Some of the wait types are not important and the DBA should not care
sys.dm_os_waits_stats • This DMV aggregates the information • Since the SQL Server service was started • Since the last DMV reset • It is obvious that the view is cleared out when the SQL Service is restarted however in production environment we don’t want to do that • Also other DMV’s are clreared • Reset only that particular DMV using the DBCC SQLPERF command • DBCC SQLPERF(’sys.dm_os_waits_stats’,CLEAR)
DEMO sys.dm_os_waits_stats
How to monitor only one particular session? • Extended Events • A modern lightweight mechanism to trace the SQL Server activity • Please configure it properly! • What to observe: • sqlos.wait_info (Begin -> End for every WAIT) • sqlos.wait_info_external (for preemptive waits -> information from OS)
DEMO XE – how to monitor one session
Waits stats across the world • Paul Randal, 2010, 1800 instances http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/
Waits stats across the world • Paul Randal, 2014, 1708 instances http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/
Waits stats across the world • CXPACKET is still the top wait type, which is unsurprising • OLEDB has increased to being the top wait type roughly 17% of the time compared to roughly 4% in 2010 • WRITELOG has increased to being the top wait 10% of the time compared with 6% in 2010 • ASYNC_NETWORK_IO has decreased to being the top wait 8% of the time compared with 15% in 2010 • PAGEIOLATCH_XX has decreased to being the top wait 7% of the time compared with 18% in 2010 http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/
Some waits may not be a bottleneck Should be called “not waiting” BOL calls these Queue Waits Normal for SQL CLR
CXPACKET – don’t panic! • It is perfectly normal to see it when a task is done on more than one processor. It is actually good • If the DBA observes that the CXPACKET is rapidly growing that is an indicator that it might be a problem • DO NOT FOLLOW THE DUMB RECOMENDATION – which is „SET THE MAXDOP to 1 ON THE INSTANCE LEVEL”
CXPACKET – not a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4
CXPACKET – not a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4
CXPACKET – not a problem here Table CXPACKET Thread 0 Thread 1 Thread 2 Thread 3 Thread 4
CXPACKET – a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4
CXPACKET – a problem here Table Thread 0 Thread 1 Thread 2 Thread 3 Thread 4
CXPACKET – a problem here Table Thread 0 Thread 1 CXPACKET Thread 2 Thread 3 Thread 4
CXPACKET • Check if you see the PAGEIOLATCH_SH, too. That indicates table scan • Table scan • Not have a nonclustered index • Query plan might be not optimal • Statisctics are skewed • Recommendation • Take a look into statistics • Verify the indexing startegy • MAXDOP …
MAXDOP – some notes • I have found some „recomendations” in the internet. Do not follow them • Set MAXDOP to 1 if the CXPACKET is a most important WAIT • Set MAXDOP to 1 if you have OLTP system • Set MAXDOP to the half of number of physical processors • Set MAXDOP to number of NUMA nodes • Facts • There is no one good startegy of setting the MAXDOP • There are good recomendations what to do
MAXDOP – recomendations • If you have an OLTP system – try to set the MAXDOP to 1. You can try to identify the worst performing queries and set the MAXDOP to 1 for those queries. In some cases setting the MAXDOP to 1 on the instance level might help (test it!) • If the system has mixed characteristrics like reporting – oltp – analysis do not set MAXDOP=1 on instance level ever. Identify the worst performing queries and try to change MAXDOP. Use Resource Governor and set the MAXDOP limit – this cannot be overriden • Check other possibilities - for example statistics might be outdated • Consider change the Cost Threshold for Parallelism to higher value
OLEDB • The OLE DB mechanism is being used however that does not mean that a linked server is used. It is of course possible (so take a look on that, too) but review also other possibilities • The DBCC CHECKDB and related commands use OLE DB internally • Many DMVs use OLE DB internally so it could be a third-party monitoring tool that is repeatedly calling DMVs
WRITELOG • Waiting for a transaction log block buffer to flush to disk • Do not assume that the transaction log file I/O system has a problem (although this is often the case) • Do not create additional transaction log files
WRITELOG • Correlate WRITELOG wait time with sys.dm_io_virtual_file_stats • Lookfor LOGBUFFER waits, showing internal contention for log buffers • Look at average size of transactions. • Look at average disk write queue length for log drive. Remember that there is a hardcoded limit of 32 outsanding transaction log writes for a single database • If yes – it is a REAL problem. • Investigate whether frequent page splits are occurring
WRITELOG • Use better, faster drives like SSD • Make the transaction take longer time or try to lower down the amount of transactions • Take a look at non used indexes – the SQL Server must maintain them • Check the FILLFACTOR – maybe there is too much page splits • Last resort – new database