2015-12-01 38 views
0

我保存的.html文件中解析表,它看起來像BeautifulSoup解析值:Python中,使用從表

enter image description here

的HTML代碼是這樣的:

<table id="detailBody" width="100%" cellspacing="0" cellpadding="0" border="0" class="tab2" style="display: block;"><tbody> 
             <tr><td><ul><li><span>15:00:19</span><span class="red">11.750</span><span class="red">5392</span><span class="fr red">↑</span></li><li><span>14:56:55</span><span class="red">11.750</span><span class="red">17</span><span class="fr red">↑</span></li><li><span>14:56:52</span><span class="red">11.750</span><span class="red">479</span><span class="fr red">↑</span></li><li><span>14:56:49</span><span class="">11.740</span><span class="green">6</span><span class="fr green">↓</span></li><li><span>14:56:46</span><span class="">11.740</span><span class="green">333</span><span class="fr green">↓</span></li><li><span>14:56:43</span><span class="">11.740</span><span class="green">21</span><span class="fr green">↓</span></li><li><span>14:56:40</span><span class="">11.740</span><span class="green">15</span><span class="fr green">↓</span></li><li><span>14:56:37</span><span class="">11.740</span><span class="green">35</span><span class="fr green">↓</span></li><li><span>14:56:34</span><span class="red">11.750</span><span class="red">11</span><span class="fr red">↑</span></li><li><span>14:56:31</span><span class="">11.740</span><span class="green">3</span><span class="fr green">↓</span></li><li><span>14:56:28</span><span class="">11.740</span><span class="green">24</span><span class="fr green">↓</span></li><li><span>14:56:22</span><span class="red">11.750</span><span class="red">291</span><span class="fr red">↑</span></li><li><span>14:56:19</span><span class="">11.740</span><span class="red">198</span><span class="fr red">↑</span></li><li><span>14:56:16</span><span class="green">11.730</span><span class="green">15</span><span class="fr green">↓</span></li></ul></td></tr> 
            </tbody></table> 

什麼我到目前爲止是:

list_a = soup.find_all('table')[0].tbody.find_all("tr") 

for a in list_a: 
    for b in a: 
     for c in b: 
      for d in c: 
       for e in d: 
        print e.renderContents() 

即使它看起來不是很好,結果如下:

15:00:19 
11.750 
5392 
↑ 
14:56:55 
11.750 
17 
↑ 
14:56:52 
11.750 
479 
↑ 

但是表中有太多內容,我只想要表中的前10組數據。只有第三和第四項放在2個列表中。

[「5392」, 「17」, 「479」, …] 

[「↑」, 「↑」, 「↑」, …] #the 「↑」 can be changed to something else identical if it's a problem 

我怎麼能做到這一點?謝謝。

+0

添加HTML的不是圖像。 – SIslam

+0

我想他應該說,你應該添加實際的html代碼,而不僅僅是圖片,以便我們可以更好地幫助你; – nablahero

+0

@SIslam和nablahero,感謝您的評論。 –

回答

1

下面將使用li元素中的span標籤提取您的兩列:

html = """ 
<table id="detailBody" width="100%" cellspacing="0" cellpadding="0" border="0" class="tab2" style="display: block;"> 
<tbody> 
<tr> 
    <td> 
    <ul> 
    <li><span>15:00:19</span><span class="red">11.750</span><span class="red">5392</span><span class="fr red">?</span></li> 
    <li><span>14:56:55</span><span class="red">11.750</span><span class="red">17</span><span class="fr red">?</span></li> 
    <li><span>14:56:52</span><span class="red">11.750</span><span class="red">479</span><span class="fr red">?</span></li> 
    <li><span>14:56:49</span><span class="">11.740</span><span class="green">6</span><span class="fr green">?</span></li> 
    <li><span>14:56:46</span><span class="">11.740</span><span class="green">333</span><span class="fr green">?</span></li> 
    <li><span>14:56:43</span><span class="">11.740</span><span class="green">21</span><span class="fr green">?</span></li> 
    <li><span>14:56:40</span><span class="">11.740</span><span class="green">15</span><span class="fr green">?</span></li> 
    <li><span>14:56:37</span><span class="">11.740</span><span class="green">35</span><span class="fr green">?</span></li> 
    <li><span>14:56:34</span><span class="red">11.750</span><span class="red">11</span><span class="fr red">?</span></li> 
    <li><span>14:56:31</span><span class="">11.740</span><span class="green">3</span><span class="fr green">?</span></li> 
    <li><span>14:56:28</span><span class="">11.740</span><span class="green">24</span><span class="fr green">?</span></li> 
    <li><span>14:56:22</span><span class="red">11.750</span><span class="red">291</span><span class="fr red">?</span></li> 
    <li><span>14:56:19</span><span class="">11.740</span><span class="red">198</span><span class="fr red">?</span></li> 
    <li><span>14:56:16</span><span class="green">11.730</span><span class="green">15</span><span class="fr green">?</span></li> 
    </ul> 
    </td> 
</tr> 
</tbody></table>""" 

soup = BeautifulSoup(html) 

col_3 = [] 
col_4 = [] 

for li in soup.find_all('table')[0].find_all("li"): 
    cols = li.find_all("span") 
    col_3.append(cols[2].text) 
    col_4.append(cols[3].text) 

print col_3 
print col_4 

這會爲您提供以下的輸出:

[u'5392', u'17', u'479', u'6', u'333', u'21', u'15', u'35', u'11', u'3', u'24', u'291', u'198', u'15'] 
[u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?'] 
+0

這是完美的。感謝指導,這對許多學習者有益。 –

2

爲什麼你不試圖直接找到所有的span項目,因爲這是你真正想要或沒有? 所以不是

list_a = soup.find_all('table')[0].tbody.find_all("tr") 

嘗試

list_a = soup.find_all('table')[0].tbody.find_all("tr")[0].find_all("span") 

我不知道,如果你的表只有一行。如果是的話,這個應用程序會工作並給你所有的跨度,而你只是跳過你不需要的那個。如果你有多行,你必須遍歷像這樣的行

list_a = soup.find_all('table')[0].tbody.find_all("tr") 
for a in list_a: 
    a.find_all("span") 

並且您將再次獲得所有的span項目。我希望這會讓你走向正確的方向!

+0

感謝您的幫助。希望你不介意我選擇另一個回答我的所有問題。 :) –

+0

恥辱你);這是好的。 – nablahero