2016-03-01 106 views
2

我使用下面的代碼(來自retrieve links from web page using python and BeautifulSoup服用):BeautifulSoup不工作,得到NoneType錯誤

import httplib2 
from BeautifulSoup import BeautifulSoup, SoupStrainer 

http = httplib2.Http() 
status, response = http.request('http://www.nytimes.com') 

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): 
    if link.has_attr('href'): 
     print link['href'] 

不過,我不明白爲什麼我收到以下錯誤信息:

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module> 
    if link.has_attr('href'): 
TypeError: 'NoneType' object is not callable 

BeautifulSoup 3.2.0 的Python 2.7

編輯:

我嘗試了類似的問題(Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable)提供的解決方案,但它給我以下錯誤:

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module> 
    for link in BeautifulSoup(response).find_all('a', href=True): 
TypeError: 'NoneType' object is not callable 
+2

[類型的錯誤可能的複製如果link.has \ _attr('href'):TypeError:'NoneType'對象不可調用](http://stackoverflow.com/questions/19424009/type-error-if-link-has-attrhref-typeerror-nonetype -object-is-not-callabl) –

+0

@DavidZemens重複的問題尚未解決。請參閱該問題中的評論。 –

+0

重複的問題有一個可接受的答案,它可以識別*爲什麼*您會收到錯誤。考慮一些額外的調試,並根據需要使用'try/except' ... –

回答

5

首先:

from BeautifulSoup import BeautifulSoup, SoupStrainer

您正在使用BeautifulSoup version 3這是沒有保持較長時間。切換到BeautifulSoup version 4。通過安裝:

pip install beautifulsoup4 

,改變你的進口:

from bs4 import BeautifulSoup 

另外:

Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable

這裏link是不具有has_attr方法Tag實例。這意味着,要記住什麼是dot notation means in BeautifulSoup,它會嘗試在link元素內搜索元素has_attr,這不會導致任何結果。換句話說,link.has_attrNone,明顯是None('href')會導致錯誤。

相反,這樣做:

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True)) 
for link in soup.find_all("a", href=True): 
    print(link['href']) 

僅供參考,這裏是我用來調試您的問題一個完整的工作代碼(使用requests):

import requests 
from bs4 import BeautifulSoup, SoupStrainer 


response = requests.get('http://www.nytimes.com').content 
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True): 
    print(link['href'])