我們已經運行一個2節點集羣ES一個2節點集羣elasticsearch(現1.4.1)與所有缺省值,以下覆蓋:重啓零停機
config.cluster.name = "..."
config.discovery.zen.ping_timeout = "5s";
config.discovery.zen.ping.multicast.enabled = false;
config.discovery.zen.ping.unicast.hosts = ["IP1", "IP2"];
近日,我們已經開始注意到,當我們通過http://127.0.0.1:9200/_cluster/nodes/_local/_shutdown
請求關閉每個節點時,集羣對30秒變爲無響應。
當主節點明確關閉時,另一個節點似乎不會立即恢復主服務器的角色......而是直到30秒(默認discovery.zen.fd.ping_timeout
)超時時間過去後才繼續嘗試。
在此期間,所述簇具有「無主」塊,並返回503到根節點請求:
{
"status" : 503,
"name" : "...",
"cluster_name" : "...",
"version" : {
"number" : "1.4.1",
"build_hash" : "89d3241d670db65f994242c8e8383b169779e2d4",
"build_timestamp" : "2014-11-26T15:49:29Z",
"build_snapshot" : false,
"lucene_version" : "4.10.2"
},
"tagline" : "You Know, for Search"
}
塊級別是[「寫」,「元數據」。
你可以看到在日誌中這樣打出來的:
[2014-12-04 17:46:16,000][INFO ][discovery.zen ] [NODE_1] master_left [[NODE_0][VXtqWIw2Q2C9b5UHvWlZyQ][RD000D3A1024B8][inet[/100.72.14.37:9300]]], reason [shut_down]
[2014-12-04 17:46:16,012][WARN ][discovery.zen ] [NODE_1] master left (reason = shut_down), current nodes: {[NODE_1][WoVynRBhQvSwvxNp1nj8kw][RD000D3A109006][inet[/100.78.140.38:9300]],}
[2014-12-04 17:46:16,012][INFO ][cluster.service ] [NODE_1] removed {[NODE_0][VXtqWIw2Q2C9b5UHvWlZyQ][RD000D3A1024B8][inet[/100.72.14.37:9300]],}, reason: zen-disco-master_failed ([NODE_0][VXtqWIw2Q2C9b5UHvWlZyQ][RD000D3A1024B8][inet[/100.72.14.37:9300]])
[2014-12-04 17:46:16,497][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:18,358][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:19,508][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:20,384][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:21,150][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:21,915][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:22,540][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:23,384][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:23,900][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:24,572][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:26,794][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:27,783][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:28,441][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:29,330][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:30,393][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:31,264][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:31,905][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:32,608][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:35,572][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:36,529][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:37,295][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:37,911][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:38,661][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:39,411][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:40,032][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:40,643][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:41,505][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:41,927][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:42,630][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:43,380][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:44,193][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:44,963][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:45,824][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:46,511][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:46,574][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:47,278][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:48,028][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:48,373][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:48,811][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:49,530][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:49,530][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:50,155][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:50,405][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:51,030][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:51,170][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:51,889][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:51,938][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:52,530][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:52,561][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:53,406][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:53,596][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:53,908][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:54,353][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:54,587][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:55,056][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:55,712][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:56,322][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:56,806][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:56,962][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:57,791][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:57,806][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:58,228][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:58,447][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:59,088][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:46:59,355][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:46:59,775][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:47:00,400][DEBUG][action.admin.cluster.state] [NODE_1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2014-12-04 17:47:00,400][DEBUG][action.admin.cluster.state] [NODE_1] no known master node, scheduling a retry
[2014-12-04 17:47:00,619][INFO ][cluster.service ] [NODE_1] new_master [NODE_1][WoVynRBhQvSwvxNp1nj8kw][RD000D3A109006][inet[/100.78.140.38:9300]], reason: zen-disco-join (elected_as_master)
我們如何強制當前節點放棄shutdown命令在其作爲主機的作用,使其它節點可以立即恢復這一責任,防止30秒無主塊停運?我們已經嘗試了各種「短暫」集羣更新呼叫,迫使立即選舉無濟於事。
我不明白爲什麼這個選項在許多其他沒有記錄。 – Wernight 2017-01-12 10:02:42