2016-07-06 60 views
-1

this question,在兩個h2之間的標籤下面的代碼循環:在BeautifulSoup中如何迭代追加函數?

from bs4 import BeautifulSoup, Tag 


data = """<h2><name>Main Section</name><content>bla bla bla</content></h2> 
<p>Bla bla bla<p> 
<h3>Subsection</h3> 
<p>Some more info</p> 

<h3>Subsection 2</h3> 
<p>Even more info!</p> 


<h2><name>Main Section 2</name><content>blah...</content></h2> 
<p>bla</p> 
<h3>Subsection</h3> 
<p>Some more info</p> 

<h3>Subsection 2</h3> 
<p>Even more info!</p>""" 


soup = BeautifulSoup(data) 
for main_section in soup.find_all('h2'): 
    for sibling in main_section.next_siblings: 
     if not isinstance(sibling, Tag): 
      continue 
     if sibling.name == 'h2': 
      break 
     print(sibling) 

這種奇妙的作品,並遍歷整個數據,如果我在最後使用print(sibling)。但是單次運行後的代碼休息,如果我用append

soup = BeautifulSoup(data) 
    for main_section in soup.find_all('h2'): 
     for sibling in main_section.next_siblings: 
      if not isinstance(sibling, Tag): 
       continue 
      if sibling.name == 'h2': 
       break 
--------> main_section.content.append(sibling.extract()) 

只有一個兄弟被包括在內容(即使刪除了extract()同樣的事情發生)。輸出是:

<h2><name>Main Section</name><content>bla bla bla<p>Bla bla bla</p></content></h2> 
<h2><name>Main Section 2</name><content>blah...<p>bla</p></content></h2> 

如果我再次運行該代碼,下一個標籤被列入<content>...</content>標籤

基本上,我想包含所有主要部分的content標籤內的數據和小節內。

我想輸出是:

<h2><name>Main Section</name><content>bla bla bla<p>Bla bla bla</p><h3>Subsection</h3><p>Some more info</p><h3>Subsection 2</h3><p>Even more info!</p></content></h2> 

    <h2><name>Main Section 2</name><content>blah...<p>bla</p><h3>Subsection</h3><p>Some more info</p><h3>Subsection 2</h3><p>Even more info!</p></content></h2> 
  1. 爲什麼當我使用迭代停止追加?
  2. 如何追加兩個主標籤之間的所有標籤?
+1

因爲你在迭代它時附加了一些東西,試圖附加到不同的列表 –

+0

感謝你的想法,它的工作 –

回答

0

追加標籤到一個新的列表解決了我的問題。

soup = BeautifulSoup(data) 
for main_section in soup.find_all('h2'): 
    x = [] 
    for sibling in main_section.next_siblings: 
     if not isinstance(sibling, Tag): 
      continue 
     if sibling.name == 'h2': 
      break 
     x.append(sibling) 
    for y in x: 
     main_section.append(y) 

我當時能夠添加所有的兄弟姐妹到main_section