2009-07-08 57 views
1

我創建了一個代碼,從任何網頁獲取圖像的URL,代碼是在Python中,並使用BeutifulSoup和httplib2。 當我運行的代碼,我得到了一個錯誤:如何解決或爲這個錯誤異常

Look me http://movies.nytimes.com   (this line is printed by the code) 
Traceback (most recent call last): 
File "main.py", line 103, in <module> 
visit(initialList,profundidad) 
File "main.py", line 98, in visit 
visit(dodo[indice], bottom -1) 
File "main.py", line 94, in visit 
getImages(w) 
File "main.py", line 34, in getImages 
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img')) 
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1499, in __init__ 
BeautifulStoneSoup.__init__(self, *args, **kwargs) 
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1230, in __init__ 
self._feed(isHTML=isHTML) 
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1263, in _feed 
self.builder.feed(markup) 
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed 
self.goahead(0) 
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead 
k = self.parse_starttag(i) 
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag 
endpos = self.check_for_whole_start_tag(i) 
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag 
self.error("malformed start tag") 
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error 
raise HTMLParseError(message, self.getpos()) 
HTMLParser.HTMLParseError: malformed start tag, at line 942, column 118 

有人可以解釋我如何解決或做出exeption的錯誤

+0

可以發佈代碼?你能看看你下載的html代碼嗎?畢竟,我們在這裏並不是無所不知的。 – Ber 2009-07-08 19:19:14

回答

2

爲了專門捕獲的錯誤,改變你的代碼看起來像這樣的:

try: 
    iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img')) 

except HTMLParseError: 
    #Do something intelligent here 

這裏是Python的嘗試一些更多閱讀except塊: http://docs.python.org/tutorial/errors.html

4

是你使用最新版本的BeautifulSoup?
這似乎是版本3.1.x的已知問題,因爲它開始使用新的解析器(HTMLParser,而不是SGMLParser),這在處理格式錯誤的HTML時會更糟糕。你可以在BeautifulSoup website找到更多相關信息。
作爲一種快速解決方案,您可以簡單地使用舊版本(3.0.7a)。

0

當我在我的HTML文檔中有字符串= &時,出現了該錯誤。當我替換該字符串(在我的情況下與=和),然後我不再收到解析錯誤。