Python/BeautifulSoup - 如何從元素中刪除所有標籤？

如何從我在BeautifulSoup中找到的元素中簡單剝離所有標籤？Python/BeautifulSoup - 如何從元素中刪除所有標籤？

2013-04-25 Daniele B

假設您想要去除的標籤，但保留的內容，請參閱接受這個問題的答案：Remove a tag using BeautifulSoup but keep its contents

2013-04-25 04:31:04 Shaun

看起來這是做的方式！就這麼簡單

這一行你的所有文字部分的當前元素中拼接

''.join(htmlelement.find(text=True))

來源

2013-04-25 04:46:12

可以在BS4使用分解方法：

soup = bs4.BeautifulSoup('<body><a href="http://example.com/">I linked to <i>example.com</i></a></body>') 

for a in soup.find('a').children: 
    if isinstance(a,bs4.element.Tag): 
     a.decompose() 

print soup 

Out: <html><body><a href="http://example.com/">I linked to </a></body></html>

來源

2013-10-17 22:37:41 danblack

爲什麼一直無人接聽我見過有關unwrap方法的任何內容？或者，更加輕鬆，通過get_text方法

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#unwrap http://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text

來源

2014-04-29 00:40:34 Bobby

隨着BeautifulStoneSoup在bs4走了，它甚至在Python3

from bs4 import BeautifulSoup 

soup = BeautifulSoup(html) 
text = soup.get_text() 
print(text)

來源

2015-01-27 02:47:02 shawnl

這是最好使用'get_text（）'而不是'getText（）'。 – SparkAndShine 2015-07-20 16:21:17

這是爲什麼？很可能是這種情況，但理解原因會有所幫助。 – 2015-08-18 08:41:42

+11

getText（）是bs3語法，不符合pep8。它可能會被棄用。 – 2015-08-31 18:04:50

使用get_text()簡單，它返回一個文檔或下方中的所有文本一個標籤，作爲一個單一的Unicode字符串。

例如，從下面的文字中刪除所有不同的腳本標籤：

<td><a href="http://www.irit.fr/SC">Signal et Communication</a> 
<br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a> 
</td>

預期的結果是：

Signal et Communication 
Ingénierie Réseaux et Télécommunications

這裏是源代碼：

#!/usr/bin/env python3 
from bs4 import BeautifulSoup 

text = ''' 
<td><a href="http://www.irit.fr/SC">Signal et Communication</a> 
<br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a> 
</td> 
''' 
soup = BeautifulSoup(text) 

print(soup.get_text())

來源

2015-07-20 16:37:08 SparkAndShine

Python/BeautifulSoup - 如何從元素中刪除所有標籤？

回答

相關問題