2016-09-20 111 views
1

我看到下面kubernetes集成測試失敗相當一致,時間約90%的RHEL 7.2,Fedora的24,並CentOS7.1:Kubernetes複製控制器集成測試失敗

test/integration/garbagecollector 
test/integration/replicationcontroller 

他們似乎由於etcd失敗。我的在線查詢讓我相信這也可能包含一個apiserver問題。我的設置很簡單,我安裝/啓動docker,安裝go,從github克隆kubernetes回購,使用回購中的hack/install-etcd.sh並將其添加到路徑,獲取銀杏,gomega和go-bindata,然後運行'進行測試集成「。我不會手動更改任何內容或添加任何自定義文件/配置。有沒有人遇到這些問題,並知道解決方案?我在網上看到的這個問題的唯一提及被認爲是片狀的,沒有列出的解決方案,但幾乎每一次測試都會遇到這個問題。錯誤的片下面,如果需要的話我可以給更多:

垃圾收集器:

\*many lines from garbagecollector.go that look good* 

I0920 14:42:39.725768 11823 garbagecollector.go:479] create storage for resource { v1 secrets} 

I0920 14:42:39.725786 11823 garbagecollector.go:479] create storage for resource { v1 serviceaccounts} 

I0920 14:42:39.725803 11823 garbagecollector.go:479] create storage for resource { v1 services} 

I0920 14:43:09.565529 11823 trace.go:61] Trace "List *rbac.ClusterRoleList" (started 2016-09-20 14:42:39.565113203 -0400 EDT): 

[2.564µs] [2.564µs] About to list etcd node 

[30.000353492s] [30.000350928s] Etcd node listed 

[30.000361771s] [8.279µs] END 

E0920 14:43:09.566770 11823 cacher.go:258] unexpected ListAndWatch error: pkg/storage/cacher.go:198: Failed to list *rbac.RoleBinding: client: etcd cluster is unavailable or misconfigured 

\*repeats over and over with different thing failed to list* 

複製控制器:

I0920 14:35:16.907283 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907293 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907298 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907303 10482 replication_controller.go:481] replication controller worker shutting down 

I0920 14:35:16.907307 10482 replication_controller.go:481] replication controller worker shutting down 

E0920 14:35:16.948417 10482 util.go:45] Metric for replication_controller already registered 

--- FAIL: TestUpdateLabelToBeAdopted (30.07s) 

replicationcontroller_test.go:270: Failed to create replication controller rc: Timeout: request did not complete within allowed duration 

E0920 14:44:06.820506 12053 storage_rbac.go:116] unable to initialize clusterroles: client: etcd cluster is unavailable or misconfigured 

有在/ var沒有文件/記錄,即使啓動與kube。

在此先感謝!

+1

在你的主人的etcd日誌上顯示什麼有趣的東西? 「etcd集羣不可用或配置錯誤」消息表明,在etcd中可能出現問題。 –

+0

集成測試正在運行時,我在測試期間得到以下結果: 集羣正常 成員ce2a822cea30bfca健康:從http://127.0.0.1:2379獲得健康結果但由於測試失敗開始,我得到 羣集可能不健康:未能列出成員 錯誤:客戶端:etcd羣集不可用或配置錯誤 錯誤#0:客戶端:端點http://127.0.0.1:2379超過標頭超時 錯誤#1:撥號tcp 127.0.0.1:4001:getsockopt:連接被拒絕 我試過運行etcdctl --no-sync但沒有幫助 –

+0

我在失敗的測試中也發現了這個輸出: etcdserver:80%的文件記述使用tor極限[used = 886,limit = 1024] –

回答

0

我增加了文件描述符的數量限制,並且自此以後就沒有看到過這個問題。所以,要繼續解決這個問題