使用python beautifulsoup進行網頁爬蟲

2016-03-04 122 views -1 likes

-1

如何提取<p>段落標記中的數據和<li>哪些屬於名爲<div>的類？使用python beautifulsoup進行網頁爬蟲

來源

2016-03-04 pKa

交一個樣本輸入端。 –

post example html/xml –

回答

使用功能find()和find_all()：

import requests 
from bs4 import BeautifulSoup 

url = '...' 

r = requests.get(url) 
data = r.text 
soup = BeautifulSoup(data, 'html.parser') 

div = soup.find('div', {'class':'class-name'}) 
ps = div.find_all('p') 
lis = div.find_all('li') 

# print the content of all <p> tags 
for p in ps: 
    print(p.text) 

# print the content of all <li> tags 
for li in lis: 
    print(li.text)

來源

2016-03-04 08:50:38

真棒..感謝一噸:-) – pKa

相關問題

1. Python網絡爬蟲
2. 使用BeautifulSoup進行網頁掃描 - Python
3. 單頁網頁爬蟲PHP
4. 與Python 2.7.9分頁網絡爬蟲
5. 網絡爬蟲是否僅依靠主頁上的鏈接來進行爬網？
6. 網絡爬蟲
7. Python爬蟲 - html.fromstring
8. 使用Python登錄後進行爬網
9. 用飛鏢寫的網頁爬蟲
10. python網站爬蟲（多個網站）
11. Python簡單的網絡爬蟲錯誤（無限循環爬行）
12. 使用網絡爬蟲進行價格比較
13. PHP網絡爬蟲
14. java網絡爬蟲
15. 網絡爬蟲應用
16. 減輕爬蟲網頁負載
17. PHP與Python對於網絡爬蟲
18. 簡單的Python網絡爬蟲
19. Python網絡爬蟲沒有輸出
20. 自動登錄谷歌網頁爬蟲
21. Scrapy網絡爬蟲的CSS和XPath選擇器vs BeautifulSoup
22. 網絡爬蟲使用雙絞線
23. 錯誤使用C＃網絡爬蟲
24. 使用selenium，beautifulsoup和python進行網頁掃描
25. 使用LazyLoader使用Python爬取頁面BeautifulSoup
26. 運行一個網站爬蟲
27. 網絡爬蟲提取
28. 使用其他網址登錄後對網頁進行爬網
29. 需要網絡爬蟲
30. 自動網絡爬蟲