2013-03-17 49 views
1

我有這個源:Python的27 - BeautifulSoup和表

<tr id="bitstampUSD"> 
<td class="arrow" change="up" latest_trade="1363480722"> 
    <span class="down">▼</span> 
</td> 

<td class="symbol"> 
    <nobr> 
    <a href="/markets/bitstampUSD.html">bitstampUSD</a> 
    </nobr> 
    <span class="sub">USD (SEPA converted)</span> 
</td> 
<td>46.74 
    <span class="sub">41 min ago</span> 
</td> 
<td class="minichart break"> 
    <span volume="**whole heaps of number here that I want**" 
    print="**more numbers I want**" 
    avg="**more numbers I want**" 
    class="marketsparkline"></span> 
</td> 
<td>**36.39** 

    <span class="sub change">**10.35 28.46%**</span> 

</td> 
<td>**141,043.10** 
    <span class="sub">**5,132,052.22 USD**</span> 
</td> 
<td>**25.25** 
    <span class="sub">**46.58** (24h)</span> 
</td> 
<td>**49.17** 
    <span class="sub">47 (24h)</span> 
</td> 
<td class="break">**46.7**</td> 
<td>**46.74**</td> 
<td class="break">**46.78** 

    <span class="sub change">-0.04 -0.09%</span> 

</td> 
<td>**819.54** 
    <span class="sub">**38,340.96** USD</span> 
</td> 
     </tr> 

所以我想以粗體顯示的數據。 (嗯,它應該是粗體,我猜代碼標籤可以阻止這種情況的發生。兩個星號內的數據。

我設法弄清楚如何獲取代碼中的位,因爲它是在班級裏面的,但是這裏有一些在班級之外,所以我不知道如何去抓它

如果你想要看整個來源可能有幫助http://bitcoincharts.com/markets/ 它的佈局不同於我之前見過的其他表代碼。

+0

'soup.findAll( 'B')'? – TerryA 2013-03-17 01:48:19

+1

而不是擺弄HTML解析,使用[市場API](http://bitcoincharts.com/about/markets-api/)會不會更容易?我只是自己試了一下,它返回了一個很好的JSON編碼的字典列表,其值爲'[{u'volume':822.42673038,u'latest_trade':1363486862,u'bid':46.81,u'high':47.0 ,u'currency':u'USD',u'currency_volume':38473.8713986671,u'ask':46.83,u'close':46.81,u'avg':46.78091066044309,u'symbol':u'bitstampUSD',u '低':46.58}]'。 – DSM 2013-03-17 02:40:47

+0

哦,該死的!哈哈,我想,無論如何我都想知道,但如果失敗了,我會看看那個,歡呼。 – tommo 2013-03-17 02:43:06

回答

0

那麼,這輸出多一點比你要求,但應該讓你開始:

soup = BeautifulSoup(f) 
for td in soup.find_all('td', class_='minichart break'): 
    avg = td.span['avg'] 
    print_ = td.span['print'] 
    volume = td.span['volume'] 
    print avg, print_, volume 

for td in soup.find_all('td'): 
    print 'TD', td.text.split() 

在您的例子中,我獲得:

**more numbers I want** **more numbers I want** **whole heaps of number here that I want**                                               
[u'\u25bc']                                                                   
[u'bitstampUSD', u'USD', u'(SEPA', u'converted)']                                                         
[u'46.74', u'41', u'min', u'ago']                                                             
[]                                                                     
[u'**36.39**', u'**10.35', u'28.46%**']                                                            
[u'**141,043.10**', u'**5,132,052.22', u'USD**']                                                         
[u'**25.25**', u'**46.58**', u'(24h)']                                                            
[u'**49.17**', u'47', u'(24h)']                                                              
[u'**46.7**']                                                                  
[u'**46.74**']                                                                  
[u'**46.78**', u'-0.04', u'-0.09%']                                                             
[u'**819.54**', u'**38,340.96**', u'USD']