2015-03-18 52 views
0

我想從文本文件中獲取url列表,看看它們是否已存儲在elasticsearch中。這裏是代碼:Python Elasticsearch:使用來自search_exists的響應

import fileinput 
import sys 
import urllib2 
import os 
from urlparse import urlparse 
from elasticsearch import Elasticsearch 

es = Elasticsearch() 

for line_number, line in enumerate(fileinput.input('bangersandmash_items.csv', inplace=1)): 
    if len(line) > 4: 
      sys.stdout.write(line) 


#open file to load URLs 

with open('bangersandmash_items.csv') as urls: 
    for line in urls: 

     #strip out http:// as this seems to cause elasticsearch to return no results 

     url = line.rstrip() 
     prefix = 'http://' 
     if url.startswith(prefix): 
      url = url[len(prefix):] 

     #query elasticsearch to see if url already exists in library's 'link' fied 

     response = es.search_exists(index="websearch", doc_type="site", body={"query": {"match_phrase": {"link": url}}}, ignore=[400, 404]) 
      print url 
      print response 

      #Is url in library? 

      if response == "{u'exists': true}": 
       print url 
       print "bingo!" 
      else: 
       print url 
       print "nuthin." 

它打印出第19-22行格式的url,但它似乎不處理錯誤代碼。第25行和第26行輸出URL和彈性搜索的響應。第28-33行似乎沒有正確處理這些信息。有什麼想法,我在做什麼錯在這裏?

回答

0

想通了。必須調整if/else語句,以便來自elasticsearch的響應被讀作字典中的字符串:

state = str(response['exists']) 
       if state == 'True': 
       print url 
       print "bingo!" 
       [etc].