lxml - 根據父類獲取子屬性

我試圖從類foo的td標籤的第一個孩子中提取hrefs。一個例子DOM是：lxml - 根據父類獲取子屬性

<td class="foo"> 
    <a href="www.foobar1.com"></a> 
</td> 
<td class="foo"> 
    <a href="www.foobar2.com"></a> 
</td>

從這個我想獲得["www.foobar1.com", "www.foobar2.com"]

到目前爲止，我有以下幾點：

import requests 
from lxml import html 

def get_hrefs(url): 
    page = requests.get(url) 
    tree = html.fromstring(page.text) 
    td_elements = tree.xpath('//td[@class="foo"]') 

    return [el.find("a").attrib["href"] for el in td_elements]

不過，我覺得它會更有效延長xpath代替迭代，但不知道如何構建它。

謝謝。

來源

2014-11-01 i_trope

是的，你可以通過從a標籤得到@href每個td裏面把它簡化：

return tree.xpath('//td[@class="foo"]/a/@href')

來源

2014-11-01 18:25:40 alecxe

這是否僅返回第一個孩子每個TD？我問，因爲我得到更多的href的，那麼有這個類的TD。 – 2014-11-01 18:47:13

dom_tree.xpath（'// td [@ class =「link」]/a [1] // @ href'）似乎爲這種情況做了訣竅！ – 2014-11-01 18:54:59

lxml - 根據父類獲取子屬性

回答

相關問題