2017-11-11 142 views
0

我想在elasticsearch的Python - elasticsearch.exceptions.RequestError

提取數據和我的功能是這樣的:

##Using regex to get the image name. 
#it is inefficient to fetch them one by one using doc['hits']['hits'][n]['_source']['docker_image_short_name'] 
#because thousands of documents are stored per images 
regex = "docker_image_short_name': u'(.+?)'" 
pattern=re.compile(regex) 
query={ 
     "query":{ 
      "bool":{ "must":[{"range":{"@timestamp":{"gt":vulTime}}}] } 
     } 
    } 
page = es.search(index='crawledframe-*', body = query, scroll='1m', size=1000) 
sid = page['_scroll_id'] 
num_page = page['hits']['total'] 

imglist=[] 
while num_page > 0: 
    print num_page 
    print vulTime 
    imgs = re.findall(pattern, str(page)) 
    imglist += imgs 

    page = es.scroll(scroll_id = sid, scroll = '1m') 
    num_page = len(page['hits']['hits']) 

imglist = list(set(imglist))#remove duplicaton 

我想只提取 「docker_image_short_name」

但是,我得到錯誤(打印結果):

num_page: 2327261 
vulTime : 0001-01-01 
Traceback (most recent call last): 
    File "test.py", line 68, in <module> 
    worker_main() 
    File "test.py", line 63, in worker_main 
    imgnames = recent_crawl_index(es, vulTime) 
    File "test.py", line 45, in recent_crawl_index 
    page = es.scroll(scroll_id = sid, scroll = '1m') 
    File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped 
    return func(*args, params=params, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 1024, in scroll 
    params=params, body=body) 
    File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request 
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) 
    File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request 
    self._raise_error(response.status, raw_data) 
    File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error 
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) 
elasticsearch.exceptions.RequestError: <exception str() failed> 

我不知道爲什麼會發生這個呃ROR,因爲我用同樣的邏輯在其他代碼

和es.search()並沒有出現錯誤...

回答

0

看來你正在使用Elasticsearch DSL的版本錯誤。

你需要做的是以下幾點:

  • 檢查elasticsearch版本curl -XGET 'localhost:9200'
  • 你應該再搭配你的elasticsearch版本與compatable version of Elasticsearch DSL。例如,如果你的Elasticsearch版本1.x做到以下幾點:

    - pip uninstall elasticsearch-dsl

    - pip install "elasticsearch-dsl<2.0.0"