Python TypeError：'NoneType'對象不可調用

-1

我試圖抓取一個網站並將數據寫入CSV文件（成功）。我面臨兩個挑戰：Python TypeError：'NoneType'對象不可調用

CSV文件中的數據保存在ROWS中，而不是保存在列中。
網站有頁面，1,2,3,4 ...接下來我無法瀏覽所有頁面來刮取數據。數據僅從第一頁報廢。

錯誤：

if last_link.startswith('Next'): 
TypeError: 'NoneType' object is not callable

代碼：

import requests 
import csv 
from bs4 import BeautifulSoup 

url = 'http://localhost:8088/wiki.html' 

response = requests.get(url) 
html = response.content 
soup = BeautifulSoup(html) 

table = soup.find('table', {'class' : 'tab_operator'}) 

list_of_rows = [] 
for rows in table.findAll('tr'): 
    list_of_cells = [] 
    for cell in rows.findAll('td'): 
     list_of_links = [] 
     for links in cell.findAll('a'): 
      text = links.text.replace('&nbsp;', '') 
      list_of_links.append(text) 
     list_of_rows.append(list_of_links) 

outfile = open('./outfile.csv', 'w') 
writer = csv.writer(outfile) 
writer.writerows(list_of_rows) 

try: 
    last_link = soup.find('table', {'id' : 'str_nav'}).find_all('a')[-1] 
    if last_link.startswith('Next'): 
     next_url_parts = urllib.parse.urlparse(last_link['href']) 
     url = urllib.parse.urlunparse((base_url_parts.scheme, base_url_parts.netloc, next_url_parts.path, next_url_parts.params, next_url_parts.query, next_url_parts.fragment)) 

except ValueError: 
    print("Oops! Try again...")

網站HTML代碼示例：

### Numbers to scrape ### 

<table cellpadding="10" cellspacing="0" border="0" style="margin-top:20px;" class="tab_operator"> 
<tbody><tr> 
<td valign="top"> 
<a href="http://localhost:8088/wiki/9400000">9400000</a><br> 
<a href="http://localhost:8088/wiki/9400001">9400001</a><br> 
</td> 
</tr></tbody> 
</table> 

### Paging Sample Code: ### 

<div class="pstrnav" align="center"> 
<table cellpadding="0" cellspacing="2" border="0" id="str_nav"> 
<tbody> 
<tr> 
<td style="background-color:#f5f5f5;font-weight:bold;">1</td> 
<td><a href="http://localhost:8088/wiki/2">2</a></td> 
<td><a href="http://localhost:8088/wiki/3">3</a></td> 
<td><a href="http://localhost:8088/wiki/4">4</a></td> 
<td><a href="http://localhost:8088/wiki/2">Next &gt;&gt;</a></td> 
<td><a href="http://localhost:8088/wiki/100">Last</a></td> 
</tr> 
</tbody> 
</table> 
</div>

來源

2016-04-23 Overflow

last_link是一個標籤對象，不的字符串。 BeautifulSoup將標記中的任何屬性名稱視爲標記搜索，而不是現有的屬性或方法。因爲在鏈接都沒有startswith標籤，即搜索返回None，這是您要調用該對象：

>>> last_link = soup.find('table', {'id' : 'str_nav'}).find_all('a')[-1] 
>>> last_link 
<a href="http://localhost:8088/wiki/100">Last</a> 
>>> last_link.startswith is None 
True 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
TypeError: 'NoneType' object is not callable

你要測試的包含的文本代替：

if last_link.get_text(strip=True).startswith('Next'):

這使用Tag.get_text() method訪問鏈接中的所有文本;即使鏈接中包含其他標籤（如<b>或<i>標記），使用此方法仍可正常工作。

你可能想直接在這裏Next鏈接搜索：

import re 

table = soup.select_one('table#str_nav') 
last_link = table.find('a', text=re.compile('^Next'))

正則表達式規定只有直接包含文本開始Next允許a標籤相匹配。

來源

2016-04-23 13:21:22

按建議做了更改......'table = soup.select_one（'table＃str_nav'） last_link = table.find（'a'，text = re.compile（'^ Next'）） if last_link .get_text（strip = True）.startswith（'Next'）：'.....但仍然收到錯誤信息：'table = soup.select_one（'table＃str_nav'） TypeError：'NoneType'對象不是可調用的' – Overflow

@溢出：'select_one'需要BeautifulSoup 4.4.0或更新版本（約6個月前發佈，IIRC）。 –

@Overflow：如果你的bs4安裝比較舊，使用'table = soup.select（'table＃str_nav'）[0]'。 –

Python TypeError：'NoneType'對象不可調用

回答

相關問題