使用Python/lxml和XPath檢索屬性名稱和值

我正在使用XPath和Python lxml（Python 2）。我在數據上運行了兩遍，一次選擇感興趣的記錄，一次從數據中提取值。這是一個代碼類型的示例。使用Python/lxml和XPath檢索屬性名稱和值

from lxml import etree 

xml = """ 
    <records> 
    <row id="1" height="160" weight="80" /> 
    <row id="2" weight="70" /> 
    <row id="3" height="140" /> 
    </records> 
""" 

parsed = etree.fromstring(xml) 
nodes = parsed.xpath('/records/row') 
for node in nodes: 
    print node.xpath("@id|@height|@weight")

當我運行此腳本輸出：

['1', '160', '80'] 
['2', '70'] 
['3', '140']

正如你的結果，其中一個屬性丟失看到，其他的位置屬性的變化，所以我不能告訴在第2排和第3排是否是身高或體重。

有沒有辦法獲得從etree/lxml返回的屬性的名稱？理想情況下，我應該看的格式結果：

[('@id', '1'), ('@height', '160'), ('@weight', '80')]

我承認，我可以使用ElementTree的和Python解決這個特定的情況下。但是，我希望使用XPath（以及相對簡單的XPath）解決此問題，而不是使用python處理數據。

來源

2017-02-23 Kevin Gill

我斷言我不打算使用Python是錯誤的。我發現lxml/etree實現很容易擴展到我可以使用XPath DSL進行修改。

我註冊了函數「dictify」。我改變了XPath表達式：

dictify('@id|@height|@weight|weight|height')

新的代碼是：

from lxml import etree 

xml = """ 
<records> 
    <row id="1" height="160" weight="80" /> 
    <row id="2" weight="70" ><height>150</height></row> 
    <row id="3" height="140" /> 
</records> 
""" 

def dictify(context, names): 
    node = context.context_node 
    rv = [] 
    rv.append('__dictify_start_marker__') 
    names = names.split('|') 
    for n in names: 
     if n.startswith('@'): 
      val = node.attrib.get(n[1:]) 
      if val != None: 
       rv.append(n) 
       rv.append(val) 
     else: 
      children = node.findall(n) 
      for child_node in children: 
       rv.append(n) 
       rv.append(child_node.text) 
    rv.append('__dictify_end_marker__') 
    return rv 

etree_functions = etree.FunctionNamespace(None) 
etree_functions['dictify'] = dictify 


parsed = etree.fromstring(xml) 
nodes = parsed.xpath('/records/row') 
for node in nodes: 
    print node.xpath("dictify('@id|@height|@weight|weight|height')")

這將產生以下的輸出：

['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__'] 
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__'] 
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']

來源

2017-02-23 18:17:14

你應該嘗試以下操作：

for node in nodes: 
    print node.attrib

這將返回節點的所有屬性的字典爲{'id': '1', 'weight': '80', 'height': '160'}

如果你想要得到的東西像[('@id', '1'), ('@height', '160'), ('@weight', '80')]：

list_of_attributes = [] 
for node in nodes: 
    attrs = [] 
    for att in node.attrib: 
     attrs.append(("@" + att, node.attrib[att])) 
    list_of_attributes.append(attrs)

輸出：

[[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]]

來源

2017-02-23 10:57:11 Andersson

是的，這樣的作品，但它是Python的。我想使用XPath來提取數據。使用XPath允許我讓用戶定義訪問路徑。要在Python中實現，我將不得不編寫某種形式的XPath DSL，這是毫無意義的，因爲XPath是這個空間中的DSL。 –

這樣做的技巧'/ records/row/@ */concat（name（），「，」，。）'？ – Andersson

不幸的不是。這給出了一個錯誤。打印parsed.xpath（'/ records/row/@ */concat（name（），「，」。）'） lxml.etree.XPathEvalError：Invalid expression –

使用Python/lxml和XPath檢索屬性名稱和值

回答

相關問題