如何在以下示例中使用BeauifulSoup解析數據？

-1

我是Python和BeautifulSoup的初學者，我試圖製作一個web刮板。但是，我面臨着一些問題，無法找出出路。這是我的問題：如何在以下示例中使用BeauifulSoup解析數據？

這是我想要放棄的HTML的一部分：

<tr> 
    <td class="num cell-icon-string" data-sort-value="6"> 
    <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a></td> 

</tr> 

<tr> 
    <td class="num cell-icon-string" data-sort-value="6"> 
    <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a><br> 
    <small class="aside">Mega Charizard X</small></td> 
</tr>

現在，我想從1號錶行和「超級噴火龍X」提取「噴火龍」第二排。現在，我可以從兩行中提取「Charizard」。

這裏是我的代碼：

#!/usr/bin/env python3 

from bs4 import BeautifulSoup 

soup = BeautifulSoup(open("data.html"), "lxml") 
poke_boxes = soup.findAll('a', attrs = {'class': 'ent-name'}) 

for poke_box in poke_boxes: 
    poke_name = poke_box.text.strip() 
     print(poke_name)

來源

2016-12-27 torque

-1

你需要改變你的邏輯去通過行和檢查，看看是否小元素存在，如果它打印出的文本，否則打印出來您現在的錨文本。

soup = BeautifulSoup(html, 'lxml') 
trs = soup.findAll('tr') 
for tr in trs: 
    smalls = tr.findAll('small') 
    if smalls: 
     print(smalls[0].text) 
    else: 
     poke_box = tr.findAll('a') 
     print(poke_box[0].text)

來源

2016-12-27 05:26:31

謝謝s！我瞭解你的邏輯並採用一些解決方法，能夠實現所需的事情。 – torque

import bs4 
html = '''<tr> 
    <td class="num cell-icon-string" data-sort-value="6"> 
    <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a></td> 

</tr> 

<tr> 
    <td class="num cell-icon-string" data-sort-value="6"> 
    <td class="cell-icon-string"><a class="ent-name" href="/pokedex/charizard" title="View pokedex for #006 Charizard">Charizard</a><br> 
    <small class="aside">Mega Charizard X</small></td> 
</tr>''' 
soup = bs4.BeautifulSoup(html, 'lxml')

在：

[tr.get_text(strip=True) for tr in soup('tr')]

出來：

['Charizard', 'CharizardMega Charizard X']

可以使用get_text()來連接在標籤中的所有文本，strip=Ture將剝離所有空間字符串中

來源

2016-12-27 05:46:30

如何在以下示例中使用BeauifulSoup解析數據？

回答

相關問題