我正在寫一個python腳本,允許將html文檔轉換爲reveal.js幻燈片。爲此,我需要在<section>
標籤內包裝多個標籤。用BeautifulSoup包裝多個標籤
使用wrap()
方法很容易將單個標籤包裹在另一個標籤內。不過,我無法弄清楚如何包裝多個標籤。
澄清一個例子,原始的HTML:
html_doc = """
<html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<h1 id="first-paragraph">First paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
<div>
<a href="http://link.com">Here's a link</a>
</div>
<h1 id="second-paragraph">Second paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
<script src="lib/.js"></script>
</body>
</html>
"""
"""
我想包住<h1>
和他們的下一個標籤內<section>
標籤,就像這樣:
<html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<section>
<h1 id="first-paragraph">First paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
<div>
<a href="http://link.com">Here's a link</a>
</div>
</section>
<section>
<h1 id="second-paragraph">Second paragraph</h1>
<p>Some text...</p>
<p>Another text...</p>
</section>
<script src="lib/.js"></script>
</body>
</html>
下面是如何做選擇:
from bs4 import BeautifulSoup
import itertools
soup = BeautifulSoup(html_doc)
h1s = soup.find_all('h1')
for el in h1s:
els = [i for i in itertools.takewhile(lambda x: x.name not in [el.name, 'script'], el.next_elements)]
els.insert(0, el)
print(els)
產量:
[<h1 id="first-paragraph">First paragraph</h1>, 'First paragraph', '\n ', <p>Some text...</p>, 'Some text...', '\n ', <p>Another text...</p>, 'Another text...', '\n ', <div><a href="http://link.com">Here's a link</a> </div>, '\n ', <a href="http://link.com">Here's a link</a>, "Here's a link", '\n ', '\n\n ']
[<h1 id="second-paragraph">Second paragraph</h1>, 'Second paragraph', '\n ', <p>Some text...</p>, 'Some text...', '\n ', <p>Another text...</p>, 'Another text...', '\n\n ']
的選擇是正確的,但我看不出如何包裝一個<section>
標籤內的每個選擇。
你能編輯你的文章並顯示預期的輸出嗎? – styvane
請發佈預期的輸出。 –
我添加了顯式輸出。 – Ben