In this section I am going to cover, how we went about setting up the performance counters to be collected for a duration of 18 hours and which counters were used.
Anyone interested in reading up the background on this, can find it here.
With the requirements clear to us on what was needed, we finalized upon collecting the performance values for counters “Buffer cache hit ratio”, “Buffer cache hit ratio base”, ”Stolen pages”,” Page life expectancy”, “Target Server Memory (KB)”, “Total Server Memory (KB)”.
Buffer cache hit ratio
- Indicates how the data is read from the buffer. As per Microsoft recommendations for OLTP system, this ratio should exceed 95%.
- Though this %value plays a role in determining the existence of memory contention, what was more important to us was the PLE, “Target Server Memory (KB)”, “Total Server Memory (KB)” values and to a minimal extent the stolen pages value.
Page Life Expectancy
- As per BOL, page life expectancy value is the “number of seconds a page will stay in the buffer pool without references”.
- As per Microsoft’s recommendation, 300 seconds is the minimum target for page life expectancy.
- What we were looking for was how many times the average PLE value was dropping way below 300 for a duration of every 5 minutes?
- Is the overall average way below 300 for major part of the day, etc ?
Target Server Memory (KB)
- Total amount of dynamic memory the server can consume.
Total Server Memory (KB)
- The committed memory from the buffer pool (in kilobytes).
- Ideally, if the average of “Total Server Memory (KB)” is less than the average of “Target Server Memory (KB)” for a duration of time, let’s say if we are tracking for every 15 minutes, then it means SQL instance is not under memory pressure.
- But to be doubly sure, cross check the average PLE value for the same duration. If PLE value is way above 300, yes can say with conviction no memory pressure.
- But if Total = Target Or Total > Target and the average PLE value for the same duration is way below 300, the yes SQL instance is under memory pressure.
- We wanted to observe the values for a duration of 18 hours across various time, inclusive of peak and non-peak.
- Number of pages acquired / stolen to satisfy other memory requests [Quote from KB243588]
We had 5 SQL systems on which to collect the counter values.
We had 2 options to implement the collection:
- Configure a background perfmon trace to run on each of the 5 system(s) for a duration of 18 hours, directing the log file(s) to a central folder with roll over settings.
- Code a SP, which will query the DMV sys.dm_os_performance_counters for the above mentioned counter values in a loop at a pre-defined interval [in this case every 5 seconds] and then insert the collected values into a user defined table. Then configure a SQL Agent job on the SQL instance to run for a duration of 18 hours with a job step to execute the afore mentioned SP.
Please find below the SP code, that was used to capture the values of the performance counters into a user defined table.
CREATE PROC [dbo].[CapturePerfCounter] AS SET NOCOUNT ON BEGIN IF NOT EXISTS (SELECT 1 FROM sysobjects WHERE [name]='perfcounter' AND type='u') CREATE TABLE perfcounter ( [SQL_instance] sysname, [object_name] VARCHAR(100), [counter_name] VARCHAR(50), [instance_name] VARCHAR(50), [cntr_value] bigint, [date_stamp] datetime ) WHILE 1=1 BEGIN INSERT INTO perfcounter SELECT @@SERVERNAME, [object_name], [counter_name], [instance_name], [cntr_value], GETDATE() FROM sys.[dm_os_performance_counters] WHERE [counter_name] IN ( 'Buffer cache hit ratio' ,'Buffer cache hit ratio base' ,'Stolen pages' ,'Page life expectancy' ,'Target Server Memory (KB)' ,'Total Server Memory (KB)' ) WAITFOR DELAY '00:00:5' END END