獲取所有錶行，不默認情況下使用BeautifulSoup

我試圖刮掉以下站點中的所有表數據： https://report.boonecountymo.org/mrcjava/servlet/SH01_MP.I00290s 獲取所有錶行，不默認情況下使用BeautifulSoup

表共有230行（不包括標題行），但默認爲前50行。當我點擊桌面上的下一頁按鈕（箭頭）時，會加載一個或多個新的組，但網頁不會更改。我如何使用BeautifulSoup獲取所有230行而不是僅默認的50行？

這是我使用的代碼：

import csv 
import requests 
from bs4 import BeautifulSoup 

url = "http://www.showmeboone.com/sheriff/JailResidents/JailResidents.asp" 
response = requests.get(url) 
html = response.content 

soup = BeautifulSoup(html,"html.parser") 
table = soup.find('tbody', attrs={'class':'stripe'}) 

list_of_rows = [] 
for row in table.findAll('tr'): 
    list_of_cells = [] 
    for cell in row.findAll('td'): 
     text = cell.text.replace('&nbsp;', '') 
     list_of_cells.append(text) 
    list_of_rows.append(list_of_cells[1:]) 

outfile = open("./inmates.csv", "w", newline='') 
writer = csv.writer(outfile) 
writer.writerow(["Last", "First", "Middle", "Gender", "Race", "Age", "City", "State"]) 
writer.writerows(list_of_rows)

來源

2016-11-28 SAS_N00b

可以設置max_rows參數中的URL：

https://report.boonecountymo.org/mrcjava/servlet/SH01_MP.I00290s?max_rows=500

來源

2016-11-28 16:27:50 jinksPadlock

謝謝@jinksPadlock！這工作完美。我很欣賞快速反應。 –

如果沒有辦法設置表中要查看的最大行數，是否有任何方法可以爲第一頁，第二頁，然後是第三頁等提取結果？ –

由於表格從設置輸入值重新加載，您的腳本將需要處理JavaScript。像Selenium這樣的東西可以做到這一點。 – jinksPadlock

獲取所有錶行，不默認情況下使用BeautifulSoup

回答

相關問題