sys.dm_os_wait_stats – Day 17 – One DMV a Day

Hello Geeks and welcome to the Day 17 of the long series of One DMV a day. In this series of blogs I will be exploring and explaining the most useful DMVs in SQL Server. As we go ahead in this series we will also talk about the usage, linking between DMVs and some scenarios where these DMVs will be helpful while you are using SQL Server. For the complete list in the series please click here.

Today I am going to cover one of the most famous DMV of all, sys.dm_os_wait_stats. This DMV has solved so many cases and gives a lot of information which just needs to be read and used correctly to your benefit.

In SQL Server if a task is not completed and not running it should be waiting. First question to answer before you even start troubleshooting is – What is the resource it is waiting on? In an ideal world you will have once procedure/query or application which is slow. In such cases you will use another DMV which I am going to cover tomorrow.

But, all my DBA friends who work on chaotic Production servers, we do not live in an ideal world. 🙁 Every issue we get is server level and all the applications are affected. Most of our Production servers are designed to host multiple applications or multiple functionalities in sites.

sys.dm_os_wait_stats

 

So to start with we need to identify if the problem is at the server level. Sys.dm_os_wait_stats is the right place to start our troubleshooting. This helps you identify what is the highest waits on my SQL Server. The straight forward output of this DMV doesn’t make much sense as the stats in this DMV are accumulated over time. They get flushed when a SQL instance is restarted. So to get the reliable and relevant details from sys.dm_os_wait_stats you need to get the delta of waits for a significant time frame.

There are ways to flush the data out of sys.dm_os_wait_stats without restarting SQL Server by running below script. But I warn you that this is a very bad practice. You can get the delta waits difference without flushing the data out of this DMV.

DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR)

The below query pulls out the delta waits for one minute from sys.dm_os_wait_stats. You can confirm based on the output if the problem is with signal waits or any wait type. Dealing with each wait type is a huge topic in itself. I would recommend to follow MS whitepaper on waits. I will be filtering on the wait types from the CSS repository.

   
CREATE TABLE #CSS_Waits_Repository (wait_type NVARCHAR(100));

INSERT INTO #CSS_Waits_Repository VALUES ('ASYNC_IO_COMPLETION');
INSERT INTO #CSS_Waits_Repository VALUES ('CHECKPOINT_QUEUE');
INSERT INTO #CSS_Waits_Repository VALUES ('CHKPT');
INSERT INTO #CSS_Waits_Repository VALUES ('CXPACKET');
INSERT INTO #CSS_Waits_Repository VALUES ('DISKIO_SUSPEND');
INSERT INTO #CSS_Waits_Repository VALUES ('FT_IFTS_SCHEDULER_IDLE_WAIT');
INSERT INTO #CSS_Waits_Repository VALUES ('IO_COMPLETION');
INSERT INTO #CSS_Waits_Repository VALUES ('KSOURCE_WAKEUP');
INSERT INTO #CSS_Waits_Repository VALUES ('LAZYWRITER_SLEEP');
INSERT INTO #CSS_Waits_Repository VALUES ('LOGBUFFER');
INSERT INTO #CSS_Waits_Repository VALUES ('LOGMGR_QUEUE');
INSERT INTO #CSS_Waits_Repository VALUES ('MISCELLANEOUS');
INSERT INTO #CSS_Waits_Repository VALUES ('PREEMPTIVE_XXX');
INSERT INTO #CSS_Waits_Repository VALUES ('REQUEST_FOR_DEADLOCK_SEARCH');
INSERT INTO #CSS_Waits_Repository VALUES ('RESOURCE_QUERY_SEMAPHORE_COMPILE');
INSERT INTO #CSS_Waits_Repository VALUES ('RESOURCE_SEMAPHORE');
INSERT INTO #CSS_Waits_Repository VALUES ('SOS_SCHEDULER_YIELD');
INSERT INTO #CSS_Waits_Repository VALUES ('SQLTRACE_BUFFER_FLUSH ');
INSERT INTO #CSS_Waits_Repository VALUES ('THREADPOOL');
INSERT INTO #CSS_Waits_Repository VALUES ('WRITELOG');
INSERT INTO #CSS_Waits_Repository VALUES ('XE_DISPATCHER_WAIT');
INSERT INTO #CSS_Waits_Repository VALUES ('XE_TIMER_EVENT');


DECLARE @Waits TABLE (
    WaitID INT IDENTITY(1, 1) not null PRIMARY KEY,
    wait_type NVARCHAR(60),
    wait_time_s DECIMAL(12, 2),
	resources_wait_s DECIMAL(12, 2),
	signal_wait_s decimal(12, 2));    

WITH Waits AS
(SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,
    100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
	(wait_time_ms - signal_wait_time_ms)/1000. AS resources_wait_s,
	signal_wait_time_ms/1000. AS signal_wait_s,
    ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
FROM sys.dm_os_wait_stats
WHERE wait_type IN(SELECT wait_type FROM #CSS_Waits_Repository)) -- filter out additional irrelevant waits
INSERT INTO @Waits (wait_type, wait_time_s, resources_wait_s, signal_wait_s)
SELECT W1.wait_type,
  CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
  CAST(W1.resources_wait_s AS DECIMAL(12, 2)) AS resources_wait_s,
  CAST(W1.signal_wait_s AS DECIMAL(12, 2)) AS signal_wait_s
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.resources_wait_s, W1.signal_wait_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold

WAITFOR DELAY '00:01:00';

WITH Waits AS
(SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,
    100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
	(wait_time_ms - signal_wait_time_ms)/1000. AS resources_wait_s,
	signal_wait_time_ms/1000. AS signal_wait_s,
    ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN(SELECT wait_type FROM #CSS_Waits_Repository)) -- filter out additional irrelevant waits
INSERT INTO @Waits (wait_type, wait_time_s, resources_wait_s, signal_wait_s)
SELECT W1.wait_type,
  CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
  CAST(W1.resources_wait_s AS DECIMAL(12, 2)) AS resources_wait_s,
  CAST(W1.signal_wait_s AS DECIMAL(12, 2)) AS signal_wait_s
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.resources_wait_s, W1.signal_wait_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold

SELECT wait_type, 
	MAX(wait_time_s) - MIN(wait_time_s) WaitDelta_total, 
	MAX(resources_wait_s) - MIN(resources_wait_s) WaitDelta_resource,
	MAX(signal_wait_s) - MIN(signal_wait_s) WaitDelta_signal
FROM @Waits
GROUP BY wait_Type
ORDER BY WaitDelta_total Desc

Based on the above output we can take necessary action based on the wait types. As I have already mentioned in the beginning of the post, sys.dm_os_wait_stats can only guide you get the pain area. I have showed you how to use this DMV to get the details. Troubleshooting based on each wait type will be a long series. I hope someone would be starting that soon on SQL Server Geeks. 🙂 If not I will do it some time within an years time from now. 😉

Tomorrow I will talk about another OS related DMV. So, stay tuned. Till then

Happy Learning,
Manu

Like us on FaceBook | Join the fastest growing SQL Server group on FaceBook |
Follow me on Twitter | Follow me on FaceBook

   

About Manohar Punna

Manohar Punna is a Microsoft Data Platform Consultant and a Data Platform MVP. In his day to day job he works on building database and BI systems, automation, building monitoring systems, helping customers to make value from their data & explore and learn. By passion Manohar is a blogger, speaker and Vice President of DataPlatformGeeks. He is a community enthusiast and believes strongly in the concept of giving back to the community. Manohar is a speaker at various Data Platform events from SQL Server Day, various user groups, SQLSaturdays, SQLBits, MS Ignite, SSGAS & DPS. His One DMV a Day series is the longest one day series on any topic related to SQL Server available so far.

View all posts by Manohar Punna →

Leave a Reply

Your email address will not be published.