2017-07-26 110 views
0

我們更改了我們的Apache KUDU的配置。我們已經增加了2個新的kudu大師到原來的。從單主機遷移到mutlimaster Apache KUDU配置

問題:當我們開始KUDU時,它開始將舊的領導者(原始主),現在一切正常。但過了一段時間,領導者就變成了其中一位主人,所有查詢都開始失敗。

> I0726 16:47:11.372854 99507 consensus_queue.cc:695] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc 
> [LEADER]: Connected to new peer: Peer: 
> b12b38a0d21c4ceda72b40571c34ec52, Is new: false, Last received: 
> 28.11387, Next index: 11388, Last known committed idx: 11387, Last exchange result: ERROR, Needs tablet copy: false W0726 16:47:12.373445 
> 98703 consensus_peers.cc:357] T 00000000000000000000000000000000 P 
> dbe19a36bd1f466ca87b08ebb97f28dc -> Peer 
> f47ef1fccc0949b68db09f30e430c3eb (namenode-01.datalab:7051): Couldn't 
> send request to peer f47ef1fccc0949b68db09f30e430c3eb for tablet 
> 00000000000000000000000000000000. Status: Timed out: UpdateConsensus RPC to 172.26.217.133:7051 timed out after 1.000s (ON_OUTBOUND_QUEUE). 
> Retrying in the next heartbeat period. Already tried 1 times. W0726 
> 16:47:13.123589 98703 leader_election.cc:272] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc 
> [CANDIDATE]: Term 29 election: RPC error from VoteRequest() call to 
> peer f47ef1fccc0949b68db09f30e430c3eb: Timed out: RequestConsensusVote 
> RPC to 172.26.217.133:7051 timed out after 1.761s (ON_OUTBOUND_QUEUE) 
> W0726 16:47:13.323909 98703 leader_election.cc:272] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc 
> [CANDIDATE]: Term 29 pre-election: RPC error from VoteRequest() call 
> to peer f47ef1fccc0949b68db09f30e430c3eb: Timed out: 
> RequestConsensusVote RPC to 172.26.217.133:7051 timed out after 1.969s 
> (ON_OUTBOUND_QUEUE) W0726 16:47:13.864181 98703 
> consensus_peers.cc:357] T 00000000000000000000000000000000 P 
> dbe19a36bd1f466ca87b08ebb97f28dc -> Peer 
> f47ef1fccc0949b68db09f30e430c3eb (namenode-01.datalab:7051): Couldn't 
> send request to peer f47ef1fccc0949b68db09f30e430c3eb for tablet 
> 00000000000000000000000000000000. Status: Timed out: UpdateConsensus RPC to 172.26.217.133:7051 timed out after 1.000s (ON_OUTBOUND_QUEUE). 
> Retrying in the next heartbeat period. Already tried 2 times. I0726 
> 16:47:14.424320 98727 raft_consensus.cc:887] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc 
> [term 29 LEADER]: Rejecting Update request from peer 
> f47ef1fccc0949b68db09f30e430c3eb for earlier term 28. Current term is 
> 29. Ops: [] W0726 16:47:15.204483 98703 consensus_peers.cc:357] T 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc -> 
> Peer f47ef1fccc0949b68db09f30e430c3eb (namenode-01.datalab:7051): 
> Couldn't send request to peer f47ef1fccc0949b68db09f30e430c3eb for 
> tablet 00000000000000000000000000000000. Status: Timed out: 
> UpdateConsensus RPC to 172.26.217.133:7051 timed out after 1.000s 
> (SENT). Retrying in the next heartbeat period. Already tried 3 times. 
> I0726 16:47:15.536121 99517 consensus_queue.cc:695] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc 
> [LEADER]: Connected to new peer: Peer: 
> f47ef1fccc0949b68db09f30e430c3eb, Is new: false, Last received: 
> 28.11387, Next index: 11388, Last known committed idx: 11387, Last exchange result: ERROR, Needs tablet copy: false W0726 16:47:16.537894 
> 98703 consensus_peers.cc:357] T 00000000000000000000000000000000 P 
> dbe19a36bd1f466ca87b08ebb97f28dc -> Peer 
> f47ef1fccc0949b68db09f30e430c3eb (namenode-01.datalab:7051): Couldn't 
> send request to peer f47ef1fccc0949b68db09f30e430c3eb for tablet 
> 00000000000000000000000000000000. Status: Timed out: UpdateConsensus RPC to 172.26.217.133:7051 timed out after 1.000s (SENT). Retrying in 
> the next heartbeat period. Already tried 1 times. I0726 
> 16:47:28.560550 98698 delta_tracker.cc:686] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc: 
> Flushing 303 deltas from DMS 11... I0726 16:47:28.562281 98698 
> delta_tracker.cc:628] T 00000000000000000000000000000000 P 
> dbe19a36bd1f466ca87b08ebb97f28dc: Flushed delta block: 
> 0062344469241244 ts range: [6148425451429212160, 6148425454747340800] 
> I0726 16:47:28.562363 98698 delta_tracker.cc:641] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc: 
> Reopened delta block for read: 0062344469241244 I0726 16:47:28.564522 
> 98698 maintenance_manager.cc:419] Time spent running 
> FlushDeltaMemStoresOp(00000000000000000000000000000000): real 
> 0.004s user 0.002s sys 0.000s I0726 16:47:28.564554 98698 maintenance_manager.cc:425] P dbe19a36bd1f466ca87b08ebb97f28dc: 
> FlushDeltaMemStoresOp(00000000000000000000000000000000) metrics: 
> {"fdatasync":2,"fdatasync_us":1601} I0726 16:49:28.614974 98698 
> delta_tracker.cc:686] T 00000000000000000000000000000000 P 
> dbe19a36bd1f466ca87b08ebb97f28dc: Flushing 1 deltas from DMS 0... 
> I0726 16:49:28.616822 98698 delta_tracker.cc:628] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc: 
> Flushed delta block: 0062344469241245 ts range: [6148424539781664768, 
> 6148424539781664768] I0726 16:49:28.616896 98698 delta_tracker.cc:641] 
> T 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc: 
> Reopened delta block for read: 0062344469241245 I0726 16:49:28.619011 
> 98698 maintenance_manager.cc:419] Time spent running 
> FlushDeltaMemStoresOp(00000000000000000000000000000000): real 
> 0.004s user 0.001s sys 0.001s I0726 16:49:28.619043 98698 maintenance_manager.cc:425] P dbe19a36bd1f466ca87b08ebb97f28dc: 
> FlushDeltaMemStoresOp(00000000000000000000000000000000) metrics: 
> {"fdatasync":2,"fdatasync_us":2192} W0726 16:52:36.772328 98703 
> connection.cc:462] client connection to 172.26.217.133:7051 recv 
> error: Network error: failed to read from TLS socket: Connection reset 
> by peer (error 104) W0726 16:52:36.911276 98703 
> consensus_peers.cc:357] T 00000000000000000000000000000000 P 
> dbe19a36bd1f466ca87b08ebb97f28dc -> Peer 
> f47ef1fccc0949b68db09f30e430c3eb (namenode-01.datalab:7051): Couldn't 
> send request to peer f47ef1fccc0949b68db09f30e430c3eb for tablet 
> 00000000000000000000000000000000. Status: Network error: Client connection negotiation failed: client connection to 
> 172.26.217.133:7051: connect: Connection refused (error 111). Retrying in the next heartbeat period. Already tried 1 times. W0726 
> 16:52:37.411356 98703 consensus_peers.cc:357] T 
> 00000000000000000000000000000000 P dbe19a36bd1f466ca87b08ebb97f28dc -> 
> Peer f47ef1fccc0949b68db09f30e430c3eb (namenode-01.datalab:7051): 
> Couldn't send request to peer f47ef1fccc0949b68db09f30e430c3eb for 
> tablet 00000000000000000000000000000000. Status: Network error: Client 
> connection negotiation failed: client connection to 
> 172.26.217.133:7051: connect: Connection refused (error 111). Retrying in the next heartbeat period. Already tried 2 times. 

任何想法?任何人?

請&謝謝!

回答