2016-05-15 59 views
0

在python中,我試圖從HTML文件中獲取表格,然後將這些表格屬性存儲在列表中,這樣我就可以在表格數據中進行比較。我能夠使用機械化來自動化下載ID \ Password登錄背後的HTML頁面,但將數據放入列表的第二部分是使用標籤就可以得到如下的輸出結果。所以雖然看起來我已經解決了存儲數據的問題,但我不確定如何在傳遞數據之前刪除標記?蟒蛇 - BeautifulSoup - 提取表格數據與標籤卡住

鏈接到HTML文檔:,我想拉從數據: https://www.dropbox.com/s/b684ecl7b2l3m10/guildwar.html?dl=0

樣本輸出(頂部),代碼開始從BS4

[None, None, None, <td class="t1"> 1 </td>, <td class="t1"> 2 </td>,  <td class="t1"> 3 </td>] 




from bs4 import BeautifulSoup 

soup = BeautifulSoup(open("guildwar.html")) 

rank_0 = [] 
color_1 = [] 
name_2 = [] 
land_3 = [] 
fortress_4 = [] 
power_5 = [] 


for el in soup.findAll('tr'): 
    rank = el.find('td', {'class':'t1'}) 
    rank_0.append(rank) 
    color = el.find('td', {'class':'t2'}) 
    color_1.append(color) 
    name = el.find('td', {'class':'t3'}) 
    name_2.append(name) 
    land = el.find('td', {'class':'t4'}) 
    land_3.append(land) 
    fortress = el.find('td', {'class':'t5'}) 
    fortress_4.append(fortress) 
    power = el.find('td', {'class':'t6'}) 
    power_5.append(power) 

print("Ranking") 
print(rank_0) 
print("\nMagic Color") 
print(color_1) 
print("\nMage Name") 
print(name_2) 
print("\nLand") 
print(land_3) 
print("\nFortress") 
print(fortress_4) 
print("\nPower") 
print(power_5) 

== =============================

回答

1

您可以在元素上使用text屬性,如下所示:

In [2]: s = '<tr><td class="t1"> 1 </td>, <td class="t1"> 2 </td>,  <td class="t1"> 3 </td></tr>' 

In [4]: soup = BeautifulSoup(s, "lxml") 

In [5]: for el in soup.findAll('tr'): 
    ...:  rank = el.find('td', {'class': 't1'}) 
    ...:  print("Ranking > ", rank.text) # use text attribute 
    ...:  
Ranking > 1 

在一個側面說明,我可能會存儲整個<table>和比較,如果它隨時間變化,那麼你節省時間比較所有單個列...並且只存儲數據,如果有一個更新/改變。