This is the last one in this series of SQL Server Fine – Tuning Transactional Replication Performance. While PART – 1 and PART – 2 were somehow correlated , this blog post is independent and has no dependency on the previous 2 parts .Nor does the previous 2 parts have any dependency on this post .
In this post I will discuss about a coupel of settings that can help us improving replication performance .Be careful in selecting them for implementation in your environment and do thorough tests on staging /Dev servers first.By saying be carefule I did not mean to scare you :).These settings will definitely help you gain a lot of performance .Its just that you need to make sure that what you are doing is REQUIRED and RIGHT ..
Setting off Immediate_sync option
Most people unknowingly select this option and pay heavy price at a later stage .So, far I have seen no one wanting to choose this option. But there might be some for sure .When we set immediate_sync option to true, the snapshot generated by the snapshot agent remains in the snapshot folder. Sometimes , by mistake DBAs check the option highlighted below and setting this option to true.
This snapshot is kept during the lifetime of the subscription and is kept for future initializations for new subscriptions. Not only that, any insert, update or delete related to that subscription is not removed from distribution database until the transactions are older than distribution max retention period or subscription expires. When next time a new snapshot is generated, the same old Snapshot files are picked up first, then the updates in the distributor database (msrepl_commands and msrepl_transactions) are applied .Finally the log reader applies the final changes that were made when the first two steps were being carried out.
This will prove fatal for large or medium size databases .You keep on accumulating the data, Cleanup job does not delete the replicated transactions and maintenance can’t be done on msrepl_transactions and msrepl_commands tables because they are huge .As a result we see latency from Publisher to Distributor and also from distributor to subscriber for all the publications that sends the data through that distributor.
To see how many publications have this option enabled, use sp_helppublication on all the publisher databases and check for immediate_sync columns value .A value of 1 means that this option is enabled .You can also query mspublications system table in the distribution database .
To set this option to 0, use this query:
EXEC sp_changepublication @publication = 'your_publication_name', @property = 'immediate_sync', @value = 'false' GO
Important: Do this activity during the weekend or when there is least production traffic in case your distributor database is huge because as soon as this is done , the distribution cleanup job will try to kick off all the replicated transactions from MSRepl_commands and MSRepl_transactions tables and this will generate heavy IO activity and longer locks on these tables .If done during the production hours this can create huge replication latency .
Increase the delete rate of Distribution cleanup job
This one is my favorite and let me associate an example with it .The traffic is huge in our environment and data is updated with millions of rows daily .Now, we found that the distribution cleanup job takes a considerable amount of time and latency builds up till it runs .We saw blockings on MSRepl_Transactions and MSRepl_commands tables which was obvious. To see what the cleanup job is doing we tried to check the code of Stored procedure sp_MSdistribution_cleanup :
sp_helptext sp_MSdistribution_cleanup
What it does is
- Acquires the Shared AppLock on the session.
- Updates the statistics on MSrepl_commands and MSrepl_transactions with NORECOMPUTE option.
- Calls sp_MSdistribution_delete.
- Calls sp_MSdelete_publisherdb_trans
- Deletes 2k rows from MSrepl_commands in loop until less than 2000 rows are remaining (with PageLock and MAXDOP 1 hint).
- Calls sp_MSdelete_dodelete
- Deletes top 5k rows from MSrepl_commands in loop (with PageLock and MAXDOP 1 hint).
- Updates the statistics on MSrepl_commands and MSrepl_transactions with NORECOMPUTE option.
- Releases the Shared AppLock on the session.
- Calls sp_MSdelete_publisherdb_trans
So it’s now clear that update statistics is happening twice and in between rows are being deleted from MSRepl_transactions @ 2k and MSRepl_Commands @5k .Multiple commits of 2k and 5k were adding to the long duration .We knew that the servers were powerful .So we decided to increase these values to 5 times i.e. 10k for MSRepl_transactions and 25k for MSRepl_Commands.This will be increased further in a few days after some observation. This change really proved very beneficial .Now this job finishes very quickly and latency is under control because of less blocking.
Hope you have liked this series.Happy Learning and do give us your valuable feedback.
Regards
Abhay Chaudhary
Like us on FaceBook | Follow us on Twitter
Follow me on FaceBook | Join the fastest growing SQL Server group on FaceBook
Hi Abhay,
Thank you so much for this article , just one thing if you can add the script or snapshot of each process that will help everyone to do this fine tuning.
again thank you for this article
Hi,
After replication I want maintain the history of the data not the update of the data. May be maintaining history like SCD. Can it be possible through SQL server replication capabilities ?