我可以使用BeautifulSoup刪除腳本標籤嗎？

可以使用BeautifulSoup從HTML中刪除腳本標記及其所有內容，還是必須使用正則表達式或其他內容？我可以使用BeautifulSoup刪除腳本標籤嗎？

2011-04-08 Sam

110

>>> from bs4 import BeautifulSoup 
>>> soup = BeautifulSoup('<script>a</script>baba<script>b</script>', 'lxml') 
>>> [s.extract() for s in soup('script')] 
>>> soup 
baba

來源

2011-04-08 17:31:11

什麼是鏈接附加標籤被刪除的最佳方式？現在，如果我一個接一個地重複命令，用[s.extract（）for s in soup（'script'）]，然後[s.extract（）for s in soup（'iframe'）]等，，但如果我把它們鏈接起來就像[s.extract（）for s in soup（'iframe'，'script'）]。 – Ila 2012-10-18 15:47:43

@Ali你將不得不使用'[s.extract（）for s in soup（['iframe'，'script']）]'請注意，要使用多個標籤，參數必須是列表 – 2012-10-18 19:10:50

@FábioDiniz我提取了如下內容：'' baba ''？它是一樣的嗎？ – user2883071 2015-04-29 18:03:17

如（official documentation）說，你可以使用extract方法來刪除所有搜索匹配的子樹。

import BeautifulSoup 
a = BeautifulSoup.BeautifulSoup("<html><body><script>aaa</script></body></html>") 
[x.extract() for x in a.findAll('script')]

來源

2011-04-08 17:33:44

更新答案爲那些誰可能需要以供將來參考：正確的答案是。 decompose() 您可以使用不同的方式，但decompose就地工作。

用法示例：

soup = BeautifulSoup('<p>This is a slimy text and <i> I am slimer</i></p>') 
soup.i.decompose() 
print str(soup) 
#prints '<p>This is a slimy text and</p>'

非常有用擺脫碎屑像「腳本」，「IMG」，所以，等等。

來源

2016-10-09 15:11:27 Vangel

'decompose'和'extract'之間的區別在於後者返回的是被刪除的東西，而前者只是銷燬它。所以這是對問題更準確的答案，但其他方法確實有效。 – Mike 2016-12-05 15:53:36

分解不會刪除腳本標記的內容，它只會刪除標記。 – 2017-03-24 11:16:24

我同意你的意見。這就是爲什麼我說OP的正確答案是「刪除」內容。通常用於清理不需要的標籤和格式的HTML。 – Vangel 2017-03-27 14:32:36

我可以使用BeautifulSoup刪除腳本標籤嗎？

回答

相關問題