2011-02-10 63 views
0
<table id="t_id" cellspacing="0" border="0" align="center" height="700" width="600" cellpadding="0"> 
<tbody> 
<tr><td> ..test... </td></tr> 
<tr><td> ..test... </td></tr> 
<tr><td> ..test... </td></tr> 
</tbody> 
</table> 

回答

3

這些日子裏人們傾向於選擇lxml而不是BeautifulSoup。見多麼容易,這是:

from lxml import etree 
data = """<table id="t_id" cellspacing="0" border="0" align="center" height="700" width="600" cellpadding="0"> 
<tbody> 
<tr><td> ..test... </td></tr> 
<tr><td> ..test... </td></tr> 
<tr><td> ..test... </td></tr> 
</tbody> 
</table> 
""" 
tree = etree.fromstring(data) 
table_element = tree.xpath("/table")[0] # because it returns a list of table elements 
print table_element.attrib['height'] + " and " + table_element.attrib['width'] 
+1

爲什麼人們更喜歡lxml?性能原因?由於BeautifulSoup解決方案更短,看起來更pythonic恕我直言。 – DzinX 2011-02-10 16:13:29

1

如果這是你的整個HTML,那麼這個就足夠了:

import BeautifulSoup 
soup = BeautifulSoup.BeautifulSoup("...your HTML...") 
print soup.table['width'], soup.table['height'] 
# prints: 600 700 

如果需要搜索榜第一,這不是要複雜得多,無論是:

table = soup.find('table', id='t_id') 
print table['width'], table['height']