從Xpath查詢獲取屬性和文本作爲列表

我想查詢一個html字符串，並將超鏈接中的href屬性和文本節點提取到列表（或任何其他字典）中。從Xpath查詢獲取屬性和文本作爲列表

考慮下面的代碼：

from lxml import html 
str = '<a href="href1"> Text1 </a>' \ 
     '<a href="href2"> Text2 </a>' \ 
     '<a href="href3"> Text3 </a>' 
tree = html.fromstring(str) 
items = tree.xpath('//a') 

values = list() 
for item in items: 
    text = item.text 
    href = item.get('href') 
    values.append((text, href)) 

for text, href in values: 
    print text, href

這工作！

我想知道是否可以省略for item in items:循環，並僅通過XPath查詢獲取values列表。

tree.xpath('//a/text()')和tree.xpath('//a/@href')給我一個 - 但我希望兩個值在列表中。

來源

2014-09-13 madflow

您可以使用|建立一個複合的XPath。文本和hrefs都將返回到一個列表中，items。您可以使用grouper recipe,zip(*[iterable]*2)配對每兩個項目。（但是請注意，這依賴於HREF中和文本字符串交替）：

from lxml import html 
str = '<a href="href1"> Text1 </a>' \ 
     '<a href="href2"> Text2 </a>' \ 
     '<a href="href3"> Text3 </a>' 
tree = html.fromstring(str) 
items = tree.xpath('//a/text() | //a/@href') 

for href, text in zip(*[iter(items)]*2): 
    print text, href

產生

Text1 href1 
Text2 href2 
Text3 href3

來源

2014-09-13 18:42:48 unutbu

我喜歡的Python :) – madflow 2014-09-13 18:57:21

您可以使用zip：

a = [1, 2, 3] 
b = ['a', 'b', 'c'] 
zip(a, b) # [(1, 'a'), (2, 'b'), (3, 'c')]

所以要根據您的XPath表達式：

texts = tree.xpath('//a/text()') 
hrefs = tree.xpath('//a/@href') 
values = zip(texts, hrefs)

來源

2014-09-13 18:37:31

從Xpath查詢獲取屬性和文本作爲列表

回答

相關問題