0
我有一些HTML看起來像這樣:使用BeautifulSoup解析<tr>標籤,有麻煩提取值
<tr>
<td>some text</td>
<td>some other text</td>
<td>some <b>problematic</b> other <br /> text</td>
</tr>
和一些Python它試圖抓住標籤的值並打印出每個內在價值:
soup = BeautifulSoup(data, convertEntities=BeautifulSoup.HTML_ENTITIES)
for row in soup.findAll('tr'):
print repr(row) # this prints the whole 'tr' element text just fine.
for col in row.contents:
print col.string
所以全文正確打印拍攝的HTML,但「關口」打印無最後一個元素:
some text
some other text
None
我並不熟悉BeatifulSoup或python,但它似乎是最後一個元素的內部標籤導致解析問題?
感謝