使用Python API在Elasticsearch中滾動時發生分段錯誤

我使用Elasticsearch Python API根據存儲在ES羣集中的數據計算某些內容。對於我的計算，我需要調用所有滿足一定條件的文檔，並從中獲得某些信息。因此我正在做一個大小爲1000並且持續時間爲1秒的滾動。我寫了一個Python腳本，它使用ES-Python爲我完成這項工作。使用Python API在Elasticsearch中滾動時發生分段錯誤

但是，總是在超過1400個卷軸之後腳本退出並顯示錯誤「Segmentation fault (core dumped)」。我試圖將滾動大小增加到10000，但仍然發生相同的問題。以下是腳本的，我正在做的滾動部分：

page = Elasticsearch().search(index = my_index, scroll = "1s", size = 1000, body = { "_source" : [ "_id", "@timestamp", my_field], "query" : {"bool":{"must" : [{"exists":{ "field" : my_field }},{"exists":{ "field" : "@timestamp" }}]}}}) 
sid = page['_scroll_id'] 
scroll_size = page['hits']['total'] 
while (scroll_size > 0): 
    print "Scrolling..." 
    # Get the number of results that we returned in the last scroll 
    scroll_size = len(page['hits']['hits']) 
    print "scroll size: " + str(scroll_size) 
    page = Elasticsearch().scroll(scroll_id = sid, scroll = '1s') 
    # Update the scroll ID 
    sid = page['_scroll_id']

我可以找出該行page = Elasticsearch().scroll(scroll_id = sid, scroll = '1s')負責錯誤。我已經檢查過滾動ID，它總是一樣的（至少在錯誤被拋出之前）。有人遇到過類似的問題，或者有人知道如何解決這個問題嗎？

我在OS Ubuntu 14.04的同一臺服務器上同時運行Python和Elasticsearch。 Python版本是2.7.6和ES版本是5.0.0

來源

2017-01-23 mshabeeb

你有沒有考慮過使用掃描助手呢？（http://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.scan） – iCart

我以前不知道。有使用掃描助手的任何工作示例？我一直在嘗試，但無法弄清楚它是如何工作的。 – mshabeeb

（張貼這作爲一個答案，因爲代碼的格式不評論工作）

嘗試是這樣的：

import elasticsearch 
import elasticsearch.helpers 

scanner = elasticsearch.helpers.scan(client=elasticsearch.Elasticsearch), index=my_index, query={...}, scroll='1s') 
for doc in scanner: 
    #Do something

來源

2017-01-23 13:03:31 iCart

感謝您的提示！最後，它不必使用掃描API，而是使用我在循環中執行的操作，因爲我保存了從ES中檢索的數據，這些數據在每次迭代時都進行了擴展，因此在某些時候內存是累 – mshabeeb

在最終我發現它與ES中的滾動無關，但這是一個內存問題。在循環內部，我將來自ES的輸出保存到每次迭代擴展的數組中。所以在某個時候達到了內存限制。

來源

2017-01-23 15:42:27 mshabeeb

使用Python API在Elasticsearch中滾動時發生分段錯誤

回答

相關問題