2017-01-02 44 views
0

我想從美麗的湯網頁抓取內容。使用動態ID抓取div id標記

但是,div id標籤具有動態ID。如在這種情況下,數字1是動態生成的。我如何使用它?

我試過這個。

from bs4 import BeautifulSoup 
import urllib 
r = urllib.urlopen(
    'http://forums.hardwarezone.com.sg/eat-drink-man-woman-16/%5Bofficial%5D-chit-chat-students-part-2-a-5526993-55.html').read() 

soup = BeautifulSoup(r, "lxml") 
letters = soup.find_all("div", attrs={"id":"post_message"}) 
print letters 

字母返回空列表。

+0

請提供[mcve] – MYGz

+0

@MYGz我編輯了我的問題 – aceminer

回答

3

您可以使用正則表達式裏面attrs這樣的:

from bs4 import BeautifulSoup 
import urllib 
import re 

r = urllib.urlopen(
    'http://forums.hardwarezone.com.sg/eat-drink-man-woman-16/%5Bofficial%5D-chit-chat-students-part-2-a-5526993-55.html').read() 

soup = BeautifulSoup(r, "lxml") 
letters = soup.find_all("div", attrs={"id": re.compile('post_message_\d+')}) 
print letters 
2

你可以試試這個。

from bs4 import BeautifulSoup 
import urllib 
import re 


r = urllib.urlopen(
    'http://forums.hardwarezone.com.sg/eat-drink-man-woman-16/%5Bofficial%5D-chit-chat-students-part-2-a-5526993-55.html').read() 

soup = BeautifulSoup(r, "lxml") 


letters = soup.find_all("div", attrs={"id": re.compile("^post_message_\d+")}) 
print letters