BeautifulSoup不工作，得到NoneType錯誤

我使用下面的代碼（來自retrieve links from web page using python and BeautifulSoup服用）：BeautifulSoup不工作，得到NoneType錯誤

import httplib2 
from BeautifulSoup import BeautifulSoup, SoupStrainer 

http = httplib2.Http() 
status, response = http.request('http://www.nytimes.com') 

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): 
    if link.has_attr('href'): 
     print link['href']

不過，我不明白爲什麼我收到以下錯誤信息：

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module> 
    if link.has_attr('href'): 
TypeError: 'NoneType' object is not callable

BeautifulSoup 3.2.0 的Python 2.7

編輯：

我嘗試了類似的問題（Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable）提供的解決方案，但它給我以下錯誤：

Traceback (most recent call last): 
    File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module> 
    for link in BeautifulSoup(response).find_all('a', href=True): 
TypeError: 'NoneType' object is not callable

來源

2016-03-01 John Rambo

[類型的錯誤可能的複製如果link.has \ _attr（'href'）：TypeError：'NoneType'對象不可調用]（http://stackoverflow.com/questions/19424009/type-error-if-link-has-attrhref-typeerror-nonetype -object-is-not-callabl） –

@DavidZemens重複的問題尚未解決。請參閱該問題中的評論。 –

重複的問題有一個可接受的答案，它可以識別*爲什麼*您會收到錯誤。考慮一些額外的調試，並根據需要使用'try/except' ... –

首先：

from BeautifulSoup import BeautifulSoup, SoupStrainer

您正在使用BeautifulSoup version 3這是沒有保持較長時間。切換到BeautifulSoup version 4。通過安裝：

pip install beautifulsoup4

，改變你的進口：

from bs4 import BeautifulSoup

另外：

Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable

這裏link是不具有has_attr方法Tag實例。這意味着，要記住什麼是dot notation means in BeautifulSoup，它會嘗試在link元素內搜索元素has_attr，這不會導致任何結果。換句話說，link.has_attr是None，明顯是None('href')會導致錯誤。

相反，這樣做：

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True)) 
for link in soup.find_all("a", href=True): 
    print(link['href'])

僅供參考，這裏是我用來調試您的問題一個完整的工作代碼（使用requests）：

import requests 
from bs4 import BeautifulSoup, SoupStrainer 


response = requests.get('http://www.nytimes.com').content 
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True): 
    print(link['href'])

來源

2016-03-01 21:48:34 alecxe

BeautifulSoup不工作，得到NoneType錯誤

回答

相關問題