2011-12-19 48 views
3

我正在使用python並嘗試使用一些XML並將其轉換爲字典。代碼工作正常,除了一些奇怪的文本被添加到元素標籤然後被添加到字典屬性名稱。該文本似乎是「WebServiceGeocodeQueryResult」屬性的值:「xmlns」。防止xml.etree.ElementTree.xml()包含元素標記中的網站名稱

我的代碼看起來是這樣的:

import xml.etree.ElementTree as ET 
import xml_to_dictionary # This is some code I found, it seems to work fine: 
         # http://code.activestate.com/recipes/410469-xml-as-dictionary/ 

def doSomeStuff() 
    theXML = """ 
<?xml version="1.0" encoding="utf-8"?> 
    <WebServiceGeocodeQueryResult 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
     xmlns="https://webgis.usc.edu/"> 

     <TransactionId>7307e84c-d0c8-4aa8-9b83-8ab4515db9cb</TransactionId> 
     <Latitude>38.8092475915888</Latitude> 
     <Longitude>-77.2378689948621</Longitude> 
     ... 
""" 

    tree = ET.XML(result.content) # this is where the element names get the added '{https://webgis.usc.edu/}' 
    xmldict = xml_to_dictionary.XmlDictConfig(tree) 

正如你可以在調試器中看到,在對象「樹」中的元素名稱有惱人的前綴:「{https://webgis.usc.edu/}」: enter image description here

而這個前綴被翻譯成該字典屬性名稱: enter image description here

回答

5

「怪異文本」是元素的命名空間。 ElementTree expands element names to universal names

你可以像進行預處理這是你的元素名稱:

tree = ET.XML(thexml) 
et = ET.ElementTree(tree) # this is to include root node 
for elem in et.getiterator(): #in python 2.7 or greater, getiterator() is unnecessary 
    elem.tag = elem.tag.split('}', 1)[-1] 

順便說一句,如果cElementTree是可用的,你應該使用,因爲它會更快。 (import xml.etree.cElementTree as ET