用於搜索和提取的XML和用於python的詞典

我使用Python中的以下代碼，從一個monit的API中抽取了一個體積適中的XML文件。用於搜索和提取的XML和用於python的詞典

file = urllib.request.urlopen(URL) #opening the XML URL 
    data = file.read() 
    file.close() 
    list.append(parsedXML, xmltodict.parse(data)) #Parsing to dict the XML file created

我使用XMLtoDICT到XML轉換爲一個字典，因爲我想這會更容易搜索和提取。 XMLtoDICT創建一個嵌套的有序字典，這非常棒。但是，我沒有看到一種簡單的方法來搜索python字典的每個「圖層」並提取出整個節點。

有沒有一種簡單的方法來搜索和拉出python中的字典節點進行編輯？

例如，請看下面的XML。一旦它在字典中，我需要提取以「<service」開頭的每個節點（在完整的XML文件中將會有多個節點），並在該精確節點上運行測試，並且可能會更改值。
我還需要搜索字典中的所有值，找到一個值，然後獲取該值的父節點名稱並提取整個節點。那可能嗎？或者，我應該完全跳過字典並直接使用XML嗎？如果是這樣，是否有一個支持所有這些功能的XML的Python庫？

這裏是我拉的XML數據的樣本：

<monit> 
    <server> 
     <id>9d8b2a3d3618ccc38628f6d7b89ebfd8</id> 
     <incarnation>1427714713</incarnation> 
     <version>5.4</version> 
     <uptime>44395</uptime> 
     <poll>120</poll> 
     <startdelay>0</startdelay> 
     <localhostname>DMZ-Server</localhostname> 
     <controlfile>/etc/monit/monitrc</controlfile> 
     <httpd> 
      <address>192.168.1.100</address> 
      <port>2812</port> 
      <ssl>0</ssl> 
     </httpd> 
    </server> 
    <platform> 
     <name>Linux</name> 
     <release>2.6.32-34-pve</release> 
     <version>#1 SMP Sat Nov 8 09:38:26 CET 2014</version> 
     <machine>i686</machine> 
     <cpu>8</cpu> 
     <memory>3145728</memory> 
     <swap>1048576</swap> 
    </platform> 
    <service type="3"> 
     <name>mmonit</name> 
     <collected_sec>1427759050</collected_sec> 
     <collected_usec>180381</collected_usec> 
     <status>0</status> 
     <status_hint>0</status_hint> 
     <monitor>1</monitor> 
     <monitormode>0</monitormode> 
     <pendingaction>0</pendingaction> 
     <pid>11481</pid> 
     <ppid>1</ppid> 
     <uptime>692522</uptime> 
     <children>0</children>

來源

2015-03-31 deranjer

任何樹遍歷算法會做的伎倆。

http://rosettacode.org/wiki/Tree_traversal#Python

我會堅持使用XML和使用LXML來解析和遍歷XML樹。

http://lxml.de/tutorial.html
http://lxml.de/tutorial.html#the-elementtree-class

我相信其他人在這裏會提出新的XML庫，隨意使用的。 LXML是我唯一熟悉的。

來源

2015-03-31 00:06:21 JustinDanielson

太棒了，我會保留XML文件並使用LXML，它似乎有我需要的一切。 – deranjer 2015-04-01 03:49:05

對於搜索和提取，我建議跳過字典，直接在XML上直接操作。 XPath是一個被證明是強大的遍歷和獲取XML文檔的特定部分的概念。例如，爲了獲得<service>元素隨時隨地XML文檔，你可以簡單地說的XPath在：//service

LXML如其他答覆中提到，在Python是support XPath一個可能的選項庫。例如：

from lxml import etree 
xml_source = """<root> 
    <server> 
     <id>9d8b2a3d3618ccc38628f6d7b89ebfd8</id> 
     <incarnation>1427714713</incarnation> 
     <version>5.4</version> 
     <uptime>44395</uptime> 
     <poll>120</poll> 
     <startdelay>0</startdelay> 
     <localhostname>DMZ-Server</localhostname> 
     <controlfile>/etc/monit/monitrc</controlfile> 
     <httpd> 
      <address>192.168.1.100</address> 
      <port>2812</port> 
      <ssl>0</ssl> 
     </httpd> 
    </server> 
    <platform> 
     <name>Linux</name> 
     <release>2.6.32-34-pve</release> 
     <version>#1 SMP Sat Nov 8 09:38:26 CET 2014</version> 
     <machine>i686</machine> 
     <cpu>8</cpu> 
     <memory>3145728</memory> 
     <swap>1048576</swap> 
    </platform> 
    <service type="3"> 
     <name>mmonit</name> 
     <collected_sec>1427759050</collected_sec> 
     <collected_usec>180381</collected_usec> 
     <status>0</status> 
     <status_hint>0</status_hint> 
     <monitor>1</monitor> 
     <monitormode>0</monitormode> 
     <pendingaction>0</pendingaction> 
     <pid>11481</pid> 
     <ppid>1</ppid> 
     <uptime>692522</uptime> 
     <children>0</children> 
    </service> 
</root>""" 

doc = etree.fromstring(xml_source) 
service = doc.find('.//service') 
#you can then operate on service as needed: 
#parse it to dictionary, or in this example print the markup 
print(etree.tostring(service, pretty_print=True))

來源

2015-03-31 01:57:02 har07

用於搜索和提取的XML和用於python的詞典

回答

相關問題