BeautifulSoup html標記後得到的文本

我有以下html，我想獲得文本<b>Name in Thai</b>之後是: this is what I wantBeautifulSoup html標記後得到的文本

content = """ 
<html><body><b>Name of Bangkok Bus station:</b> 
<span itemprop="name">Victory Monument</span> 
<meta content="http://www.transitbangkok.com/stations/Bangkok%20Bus/Victory%20Monument" itemprop="url"/> 
<meta content="http://www.transitbangkok.com/stations/Bangkok%20Bus/Victory%20Monument" itemprop="map"/> 
<br/><b>Name in Thai</b>: this is what i want<br/> 
</body></html> 
"""

我想這如下

soup = BeautifulSoup(content, "lxml") 
soup.find('b').next_sibling

使用 next_sibling解決方案

但是，我得到了\n作爲輸出。有沒有辦法讓特定標籤後的文本（解釋會很棒！）？

來源

2017-04-08 titipata

但是，我得到了\n作爲輸出。

這是因爲find("b")返回第一個<b>標籤遇到和你content後的第一個有隻有一個換行符。

如果您改爲遍歷所有<b>標籤。然後你會看到next_sibling給你想要的東西：

for tag in soup.find_all("b"): 
    print(tag.text) 
    print(tag.next_sibling)

輸出：

Name of Bangkok Bus station: 


Name in Thai 
: this is what i want

您可以遍歷它們，找到一個具有空間荷蘭國際集團next_sibling通過strip()後的東西」。

for tag in soup.find_all("b"): 
    after = tag.next_sibling.strip() 
    if after: 
     print(tag.next_sibling)

來源

2017-04-08 05:21:04 Vallentin

啊，明白了！感謝Vallentin非常明確的解釋。 – titipata

不客氣！隨時將答案標記爲已接受。 :) – Vallentin

當然，我必須等待3分鐘:) – titipata

BeautifulSoup html標記後得到的文本

回答

相關問題