在SQL Server 2008R2中,我有一個非常頻繁運行的代理作業。這項工作只需一個步驟即可調用存儲過程。存儲過程非常長,並調用其他存儲過程,其中一些過程也很長。由於MS DTC導致SQL Server代理作業失敗
存儲過程需要與不同服務器上的多個數據庫一起工作。
問題是代理作業有時會失敗。它會多次運行而不會失敗,然後會失敗一次,然後下一次運行它會成功運行。一切都在一個事務中完成,所以如果失敗,數據將被恢復。這讓我相信這不是一個語法或數據問題,儘管我無法確定。
當檢查作業活動管理和查看歷史記錄失敗作業,所有它說的是
The job failed. The Job was invoked by Schedule 11 (Sch0). The last step to run was step 1 (Step00).
我啓用了日誌記錄作業的第1步。我從日誌中得到的錯誤是
The Microsoft Distributed Transaction Coordinator (MS DTC) has cancelled the distributed transaction. [SQLSTATE 42000]
我看着爲MS DTC跟蹤日誌的主服務器(服務器)上,當它失敗,以下條目存在:
pid=3416;tid=3036;time=02/29/2016-12:13:11.493 ;seq=88;eventid=TRANSACTION_BEGUN ;tx_guid=<guid>;"TM Identifier='(null)'" ;"transaction has begun, description :'user_transaction'"
pid=3416;tid=3036;time=02/29/2016-12:13:11.493 ;seq=89;eventid=RM_ENLISTED_IN_TRANSACTION ;tx_guid=<guid>;"TM Identifier='(null)'" ;"resource manager #1001 enlisted as transaction enlistment #1. RM guid = '<guid>'"
pid=3416;tid=3036;time=02/29/2016-12:13:11.509 ;seq=90;eventid=TRANSACTION_PROPOGATED_TO_CHILD_NODE ;tx_guid=<guid>;"TM Identifier='(null)'" ;"transaction propagated to 'SERVER1' as transaction child node #1"
pid=3416;tid=3036;time=02/29/2016-12:13:27.947 ;seq=91;eventid=TRANSACTION_ABORTING ;tx_guid=<guid>;"TM Identifier='(null)'" ;"transaction is aborting"
pid=3416;tid=3036;time=02/29/2016-12:13:27.947 ;seq=92;eventid=RM_ISSUED_ABORT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"abort request issued to resource manager #1001 for transaction enlistment #1"
pid=3416;tid=3036;time=02/29/2016-12:13:27.947 ;seq=93;eventid=CHILD_NODE_ISSUED_ABORT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"abort request issued to transaction child node #1 'SERVER1'"
pid=3416;tid=3036;time=02/29/2016-12:13:27.947 ;seq=94;eventid=CHILD_NODE_ACKNOWLEDGED_ABORT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"received acknowledgement of abort request from transaction child node #1 'SERVER1'"
pid=3416;tid=3036;time=02/29/2016-12:13:36.993 ;seq=95;eventid=RM_ACKNOWLEDGED_ABORT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"received acknowledgement of abort request from the resource manager #1001 for transaction enlistment #1"
pid=3416;tid=3036;time=02/29/2016-12:13:36.993 ;seq=96;eventid=TRANSACTION_ABORTED ;tx_guid=<guid>;"TM Identifier='(null)'" ;"transaction has been aborted"
所以它從TRANSACTION_PROPOGATED_TO_CHILD_NODE到TRANSACTION_ABORTING,沒有說明爲什麼(據我所知)。
我檢查第二服務器(SERVER2)在MS DTC跟蹤日誌和看到以下時失敗:
pid=4032;tid=3564;time=02/29/2016-13:26:46.117 ;seq=173977;eventid=TRANSACTION_PROPOGATED_FROM_PARENT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"transaction propagated from parent node 'SERVER2', Description = 'a16ace8fa7f6'"
pid=4032;tid=3564;time=02/29/2016-13:26:46.117 ;seq=173978;eventid=RM_ENLISTED_IN_TRANSACTION ;tx_guid=<guid>;"TM Identifier='(null)'" ;"resource manager #1001 enlisted as transaction enlistment #1. RM guid = '<guid>'"
pid=4032;tid=3564;time=02/29/2016-13:27:02.758 ;seq=173979;eventid=RECEIVED_ABORT_REQUEST_FROM_NON_BEGINNER ;tx_guid=<guid>;"TM Identifier='(null)'" ;"received request to abort the transaction from non beginner"
pid=4032;tid=3564;time=02/29/2016-13:27:02.758 ;seq=173980;eventid=TRANSACTION_ABORTING ;tx_guid=<guid>;"TM Identifier='(null)'" ;"transaction is aborting"
pid=4032;tid=3564;time=02/29/2016-13:27:02.758 ;seq=173981;eventid=RM_ISSUED_ABORT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"abort request issued to resource manager #1001 for transaction enlistment #1"
pid=4032;tid=3564;time=02/29/2016-13:27:02.758 ;seq=173982;eventid=RECEIVED_ABORT_FROM_PARENT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"child node received abort request from parent node 'SERVER2'"
pid=4032;tid=3564;time=02/29/2016-13:27:02.758 ;seq=173983;eventid=ACKNOWLEDGING_ABORT_TO_PARENT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"child node achnowledging the delivery of abort request from parent node 'SERVER2'"
pid=4032;tid=3564;time=02/29/2016-13:27:05.773 ;seq=173984;eventid=RM_ACKNOWLEDGED_ABORT ;tx_guid=<guid>;"TM Identifier='(null)'" ;"received acknowledgement of abort request from the resource manager #1001 for transaction enlistment #1"
pid=4032;tid=3564;time=02/29/2016-13:27:05.773 ;seq=173985;eventid=TRANSACTION_ABORTED ;tx_guid=<guid>;"TM Identifier='(null)'" ;"transaction has been aborted"
這一個顯示RM_ENLISTED_IN_TRANSACTION後RECEIVED_ABORT_REQUEST_FROM_NON_BEGINNER。但仍然沒有跡象表明它爲什麼會被中止。
是否RECEIVED_ABORT_REQUEST_FROM_NON_BEGINNER錯誤指示中止來自主服務器(SERVER1)?還是說中止來自SERVER1以外的其他東西,因爲SERVER1是初學者?
我還檢查了SQL Server ERRORLOG文件,它不包含任何此失敗。
存儲過程使用TRY/CATCH來處理錯誤,並設置代理髮送失敗時的電子郵件通知。在這種情況下,我收到電子郵件通知,但CATCH未處理該錯誤。我知道這可能是因爲錯誤的嚴重程度很高。
還有什麼我可以做,找出究竟是什麼導致這種失敗?
看起來工作有時已經開始,在上一份工作結束之前。 – PSVSupporter