Saturday, February 25, 2012

Distribution Cleanup Acting Weird

Hi.
In a previous post I asked if there was anything that we could do to
help the cleanup job run better. We then realized that we had over
12,000,000 records in MSreplcommands.
We have been "reducing" the max retention parameter. It was sitting at
72 hours. We started reducing this parameter by 1 hour, then 2 hours,
then 4 hours and the job would complete in about 4-5 minutes. It would
delete approximately 5000 rows/sec.
Suddenly the same job takes several hours while deleting around 80
rows/sec. This is horrible. Deleting 4 hours of old command takes 4
hours. I'll never be able to catch up.
Is there anything that we can do to bring back the 5000 rows/sec
performance of the earlier calls? What would cause the performance of
this job to decrease so drastically?
We're at a point where we are contemplating rebuilding the distributor
to start with a clean slate and set the distribution cleanup to a low
max retention figure (like 1 or 2 hours). We're hoping that with such a
low figure the job will not take so long to run and not block
replication for more than a few seconds.
One more thing. We're going to run without the distribution cleanup job
for the weekend where we have most of our activity. Will there be any
issues with replication if the distribution tables get super huge.
Regards,
Canadian Gambler.
*** Sent via Developersdex http://www.codecomments.com ***
You may be experiencing blocking betweek the cleanup agent, the logreader
agent which writes to the MSreplcommands table and the distribution agent
which reads from the same table. I'd recommend suspending the log reader and
distribution agents when you give the cleanup agent a decent time to clear
out the table. Your transaction logs may get large during this time, and
you'll need to be careful not to stray anywhere close to the distribution
retention period. If this succeeds, run the log reader and distribution
agents and suspend the cleanup agent while they clear the backlog from the
transaction log.
The above applies if the issue is indeed blocking. When dealing with several
million transactions per day I also got into the situation where the
MSreplcommands table balooned and transactions weren't removed even after
being applied to the single subscriber. Only when the retention period was
reached would they be removed. If this applies to your case, consider
synchronizing then really reduce the retention period.
HTH,
Paul Ibison SQL Server MVP, www.replicationanswers.com
(recommended sql server 2000 replication book:
http://www.nwsu.com/0974973602p.html)
|||Thanks for the help.
this morning we pointed all the ODBCs to the main database, stop
replication and ran the CleanUp job. It took as long to complete.
We will have to synchronize next week, blow up the distribution database
and start fresh. Any issues setting the max retention as low as 1 or 2
hours.
Regards,
Canadian Gambler.
*** Sent via Developersdex http://www.codecomments.com ***
|||A low value for the distribution retention period is a bit risky. You're
assuming that all the transactions will get delivered tothe subscriber
during this period, and if there is a network problem, you'll have to
remember to immediately stop the cleanup agent. I've only ever set a low
value after synchronization to remove the redundant records from
MSreplcommands and then reset afterwards.
Cheers,
Paul Ibison SQL Server MVP, www.replicationanswers.com
(recommended sql server 2000 replication book:
http://www.nwsu.com/0974973602p.html)

No comments:

Post a Comment