I remembered my childhood days when I use to get mechanical toys and how I use to experiment with them eventually ending up messing everything.This is true even when we grow old . Sometimes we commit bigger mistakes .Sometimes we do not take some good steps because we are scared in case something goes wrong. We DBAs are no saints and many a times try and experiment with the system without proper approach and in a haste to see the better results quickly (the intention is always good though).
Performance Tuning SQL Server Transactional Replication is one area where most of us will agree that we have less working knowledge .This is because of its complex nature .There are more than one server involved with many agent jobs and metadata tables that makes it complex .
In this Blog post, I will be discussing about a few important Log reader Agent profile settings (in Transactional Replication) that we can dare to change but with with care. This will definitely increase the replication throughput to a great extent. I have done this myself many times (on huge enterprise systems) and therefore sharing my experience with you. However, as I said in the beginning, if you are seeing good performance already and the client has no issues, there is absolutely no need to do R&D, especially on Replication Profile settings.
I am also assuming that we all know the how important it is to do proper maintenance on the databases and hence will not discuss about making sure that we should rebuild indexes and update statististics.I will also not discuss about blocking , deadlocks etc .
Just one caution before I begin ,In transactional replication, for every database there is one log – reader agent irrespective of the number of publications on that database .You might have many publications on many databases and hence many log reader agents.Make sure that you are makin the changes in the right agent profile .
-LogScanThreshold :The default value is 500,000 which mean that the log reader will scan these many records in the transaction log that are marked for replication. This is a lot .Log reader agent does this every 5 seconds (by default polling interval is set to 5 seconds). So even if the rate of records marked for replication is 100000/sec, this setting should be fine. Sometimes you might notice that the log reader will scan through 1,000,000 records even if the default is set to 500,000.This is because of the overlapping of the time it takes to scan and the next scan time. I would personally feel that there is no need to change this setting .If you increase it, it will have a direct impact in the delivery of the transactions to the distribution database .Log reader will have to spend more time in scanning the log in one go .So in my view this setting is already fine tuned. I have also seen that on a fast system it takes the log reader around 5-7 seconds to scan this many records which is a considerable time.
-ReadBatchSize :This is the number of transactions to be taken in one scoop (once the log reader finds them during the scanning).The default is 500 which means that not more than 500 transactions will be taken per cycle (i.e. one batch). It can take a bit less then that though .For fast OLTP systems where there are multiple small increments happening all the time, this value might not be sufficient .Lets say each transaction has around 7-10 commands in it .With this rate only around 3500-5000 commands will be delivered in one batch. If you take the verbose log of log reader agent, you will notice these messages:
‘Batch committed. Batch consisted of 5003 commands, 456 xacts.’ This means 11 commands/transaction is the rate.
Be very careful in increasing this value .If you increase it to a higher number you might be asking log reader agent to pickup more transactions per batch which means more time spent in scanning and collecting them eventually compromising on the time in delivering the commands to the distribution database .You will also see frequent messages like this if actually there is not much work to do but the log reader agent is still searching for transactions (This message also appears if the VLFs are more .But in this scenario even if the VLFs are less ,you will see these frequent messages in the log reader history table).
Approximately 500000 log records have been scanned in pass # 3, 0 of which were marked for replication, elapsed time 6006 (ms).’.
You can see that 6 seconds have been wasted and like this many such seconds will be wasted .I would recommend increasing this value by 10 or 20 at a time and seeing if the throughput increases without stressing the network .Where ever you see stress, put a full stop.
This is same as –ReadBatchSize parameter with a difference that this value is for the number of commands being read from the transaction log (per transaction)and delivering the same to the distributor agent (to deliver it to the subscriber).The default is 0 and I have seen that with the default of 500 as –ReadBatchSize(discussed above) it takes around 100 commands for delivery(within some transactions).You will see multiple such entries in the verbose log , till the exact count reaches around 500(this is the default –ReadBatchSize):
‘Delivering replicated transactions, xact count: 16, command count: 101.’
‘Delivering replicated transactions, xact count: 25, command count: 201.’
I would always recommend making modifications here first rather than in the –ReadbatchSize parameter. This is a less impactful change and hence will do less harm in case you go wrong. Increase this value by a 100 and note down the impact. Normally, I start off with 500 and then increase it by 100.Now you will be delivering more commands in a batch of 500 transactions (the default value of –ReadBatchSize).
-MaxCmdsInTran :Normally a transaction should be completely written and hence completely committed .This is what Atomicity in ACID property suggests. Imagine if there is a transaction that deletes 10k Rows every 5-seconds and there are 10 other publications on the same publisher database with different table articles undergoing smaller changes. Of course we need to assume that we have only one distribution database. This will result in latency for all the publications because one big transaction with multiple commands will be stuck in the distribution database (msrepl_commands and msrepl_transactions table) and thus it will block other smaller transactions. Once you implement this change, you will be allowing the log reader to break the larger transaction in to smaller transactions and hence will reduce the latency between Publisher and Distributor .This option cannot be seen in the agent profiles but needs to be appended to the log reader agent job. Try this option only if you have biger transactions .Otherwise no need to consider this option .
-Publisher [Abhay\SQL2k8_Pub] -PublisherDB [test] -Distributor [Abhay\ SQL2k8_DIST] -DistributorSecurityMode 1 -MaxCmdsInTran 10 -Continuous
At the end , I would like to repeat that do small increments and notice the impact rather than jumping to a bigger number straight away .All the changes to the profile parameters need the agents to be restarted .
Happy Learning and your comments are most welcome!