XML解析到一個哈希表

我有以下格式的XML文件：XML解析到一個哈希表

<doc> 
<id name="X"> 
    <type name="A"> 
    <min val="100" id="80"/> 
    <max val="200" id="90"/> 
    </type> 
    <type name="B"> 
    <min val="100" id="20"/> 
    <max val="20" id="90"/> 
    </type> 
</id> 

<type...> 
</type> 
</doc>

我想解析這個文件，並建立一個哈希表

{X: {"A": [(100,80), (200,90)], "B": [(100,20), (20,90)]}, Y: .....}

如何我會用Python做這個嗎？

來源

2009-12-15 user231536

這種問題已被問了幾次。答案可能能夠幫助你。 http://stackoverflow.com/questions/191536/converting-xml-to-json-using-python http://stackoverflow.com/questions/471946/how-to-convert-xml-to-json- in-python – Thomas 2009-12-15 16:04:09

正如其他人所指出minidom是去這裏的路。您打開（並解析）文件，同時檢查節點，檢查節點是否與其相關並應讀取。這樣，你也知道你是否想讀取子節點。

扔在一起這似乎做你想做的。有些值是通過屬性位置而不是屬性名稱讀取的。而且沒有錯誤處理。最後的print（）意味着它的Python 3.x.

我會把它作爲一個練習來改進，只是想發佈一個片段，讓你開始。

快樂黑客！ :)

xml.txt

<doc> 
<id name="X"> 
    <type name="A"> 
    <min val="100" id="80"/> 
    <max val="200" id="90"/> 
    </type> 
    <type name="B"> 
    <min val="100" id="20"/> 
    <max val="20" id="90"/> 
    </type> 
</id> 
</doc>

parsexml.py

from xml.dom import minidom 
data={} 
doc=minidom.parse("xml.txt") 
for n in doc.childNodes[0].childNodes: 
    if n.localName=="id": 
     id_name = n.attributes.item(0).nodeValue 
     data[id_name] = {} 
     for j in n.childNodes: 
      if j.localName=="type": 
       type_name = j.attributes.item(0).nodeValue 
       data[id_name][type_name] = [(),()] 
       for k in j.childNodes: 
        if k.localName=="min": 
         data[id_name][type_name][0] = \ 
          (k.attributes.item(1).nodeValue, \ 
          k.attributes.item(0).nodeValue) 
        if k.localName=="max": 
         data[id_name][type_name][1] = \ 
          (k.attributes.item(1).nodeValue, \ 
          k.attributes.item(0).nodeValue) 
print (data)

輸出：

{'X': {'A': [('100', '80'), ('200', '90')], 'B': [('100', '20'), ('20', '90')]}}

來源

2009-12-15 16:38:07 Mizipzor

對不起，錯了房間。富爾代碼競賽在大廳裏。 – 2009-12-15 21:31:51

我建議使用minidom庫。

文檔非常好，所以你應該立即開始運行。

丹。

來源

2009-12-15 16:00:45 freeasinbeer

爲什麼不嘗試類似PyXml庫。他們有很多文檔和教程。

來源

2009-12-15 16:02:45 Gordon

**警告**挪威藍鸚鵡綜合徵：5年前的最新版本。沒有用於Python 2.5和2.6的Windows安裝程序。 – 2009-12-16 21:27:20

另一個XML解析庫：http://www.crummy.com/software/BeautifulSoup/

解析XML文檔，從這裏開始：http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20XML

來源

2009-12-15 16:04:45 miku

我對BeautifulSoup和解析URL比本地XML文件更熟悉，所以這對我來說是一個很好的解決方案。 – Flowpoke 2011-03-08 23:47:21

我同不同意sugges在其他使用minidom的答案中 - 這是一種Python的改編，最初是爲其他語言設計的標準，適用於但不是很適合。現代Python中推薦的方法是ElementTree。

在第三方模塊lxml中也實現了相同的接口，但除非您需要超速，Python標準庫中包含的版本沒有問題（並且速度也比minidom快） - 關鍵是編程到那個接口，那麼如果你願意，你可以在將來隨時切換到相同接口的不同實現，只需對自己的代碼進行最小限度的更改。

例如，在需要導入& c後，下面的代碼是您的示例的最小實現（它不驗證XML是否正確，只是假設正確提取數據 - 添加各種檢查很漂亮當然容易）：

from xml.etree import ElementTree as et # or, import any other, faster version of ET 

def xml2data(xmlfile): 
    tree = et.parse(xmlfile) 
    data = {} 
    for anid in tree.getroot().getchildren(): 
    currdict = data[anid.get('name')] = {} 
    for atype in anid.getchildren(): 
     currlist = currdict[atype.get('name')] = [] 
     for c in atype.getchildren(): 
     currlist.append((c.get('val'), c.get('id'))) 
    return data

這給你的樣品輸入產生你想要的結果。

來源

2009-12-15 17:18:26

'對於node.getchildren（）中的孩子''：不必要;改爲在節點中使用'child：'。 – 2009-12-16 21:27:59

*警告*：對於惡意構造的數據，xml.etree.ElementTree模塊不安全。如果您需要解析不可信或未經身份驗證的數據，請參閱XML漏洞。只是要小心。 – igaurav 2015-01-27 05:39:56

不要重新發明輪子。使用Amara工具包。無論如何，變量名稱只是字典中的鍵。 http://www.xml3k.org/Amara

來源

2009-12-15 21:25:13

另一個鏈接 - http://www.xml.com/pub/a/2005/01/19/amara.html 你將最終得到一個變量doc，它有doc.id，它有doc.id.type [0]，然後是doc.id.type [0] .min，...等等。超容易訪問！ – 2009-12-15 21:33:02

XML解析到一個哈希表

回答

相關問題