如何用美麗的湯從AKC狗註冊網站刮取數據？

我想從美國內核俱樂部（https://www.akc.org/reg/dogreg_stats.cfm）刮取數據，我一直有一些麻煩。我指的是this stackoverflow post，我可以獲得第二張桌子上的所有行，但我無法格式化它們。如何用美麗的湯從AKC狗註冊網站刮取數據？

所以這裏是我的代碼。

from bs4 import BeautifulSoup 
import requests 
url = https://www.akc.org/reg/dogreg_stats.cfm 
r. requests.get(r) 
data= r.text 
soup = BeautifulSoup(data) 
rows = soup.find_all('table')[1].find_all('tr') 

for row in rows: 
    cells = soup.find_all('td') 
    firstRanking = cell[1].get_text() 
    print(firstRanking)

所有它打印出來是

More on Registration Trends: 
More on Registration Trends: 
More on Registration Trends: 
More on Registration Trends: 
More on Registration Trends: 
More on Registration Trends: 
More on Registration Trends:

而不是實際的排名。

來源

2014-09-25 Zaynaib Giwa

當您創建變量「細胞」，你想被發現行的所有「td」元素，整個「湯」對象的不。

它應該是這樣的：

cells = row.find_all('td')

而且，我相信，在這之後該行的錯誤，這是「細胞」，而不是「細胞」所引用：

firstRanking = cells[1].get_text()

這將使for循環是這樣的：

for row in rows: 
    cells = row.find_all('td') 
    firstRanking = cells[1].get_text() 
    print(firstRanking)

來源

2014-09-25 20:11:42 JB333

太感謝你了！ – 2014-09-26 06:27:43

我做的主要錯誤是在這條線rows = soup.find_all（'table'）[1] .find_all（'tr'） < - 創建了一個列表項。要解決我行更改爲表= soup.find_all（ '表'）[1]然後行= table.find_all（ 'TR'）問題

來源

2014-09-26 06:26:46

如何用美麗的湯從AKC狗註冊網站刮取數據？

回答

相關問題