2014-03-31 56 views
1

我想用ScraperWiki刮這PDF。當前的代碼給了我的名字「數據」的錯誤沒有被定義,但我收到Scraping PDF與ScraperWiki並得到一個不確定的錯誤

elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text 

錯誤。如果我評論說,線路輸出我得到我的else語句相同的錯誤。

這裏是我的代碼

import scraperwiki 
import urllib2, lxml.etree 
#Pull Mondays 
url = 'http://www.city.pittsburgh.pa.us/police/blotter/blotter_monday.pdf' 
pdfdata = urllib2.urlopen(url).read() 
xmldata = scraperwiki.pdftoxml(pdfdata) 
root = lxml.etree.fromstring(xmldata) 
# how many pages in PDF 
pages = list(root) 
print "There are",len(pages),"pages" 
# Test Scrape of only Page 1 of 29 
for page in pages[0:1]: 
    for el in page: 
     if el.tag == "text": 
      if int(el.attrib['left']) < 11: data = { 'Report Name': el.text } 
      elif int(el.attrib['left']) < 317: data['Location of Occurrence'] = el.text 
      elif int(el.attrib['left']) < 169: data['Incident Time'] = el.text 
      elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text 
      elif int(el.attrib['left']) < 338: data['Description'] = el.text 
      else: 
       data['Zone'] = el.text 
       print data 

我在做什麼錯?

還有任何更好的解決方案的建議,將不勝感激。

回答

1

除非你跳過了一些你的代碼,你data字典只被創建如果在該行的條件匹配:

if int(el.attrib['left']) < 11: data = { 'Report Name': el.text }

所有在那裏你在設定值的其他行data取決於它已經存在,所以如果第一個條件不匹配,你會得到NameError

快速修復將總是創建一個空的數據字典,例如,

for page in pages[0:1]: 
    for el in page: 
     data = {} 
     if el.tag =="text": 

相關問題