2012-08-13 44 views
0

我使用Python和ElementTree來解析XML文件。我希望能夠列出包含所有CD信息的字典列表。稍後我可以使用此列表來收集信息,例如顯示來自美國的CD的標題。下面的代碼正在工作,但如果YEAR標籤不是CD的最後一個標籤,則很容易被破壞。我怎樣才能重寫這段代碼,使標籤可以以任何順序?在Python中使用元素樹進行XML解析

from xml.etree.ElementTree import ElementTree 

f = open("cd_catalog.xml") 
tree = ElementTree() 
tree.parse(f) 

catalog = [] 
cd = {} 
for node in tree.iter(): 
    if node.tag != "CD" and node.tag != "CATALOG": 
     tagtext = (node.tag,node.text), 
     cd.update(tagtext) 
    if node.tag == "YEAR": 
     catalog.append(cd) 
     cd = {} 

for cd in catalog: 
    if cd["COUNTRY"] == "USA": 
     print("The cd named {0} is from USA".format(cd["TITLE"])) 

2項的XML文件:

<CATALOG> 
    <CD> 
     <TITLE>Empire Burlesque</TITLE> 
     <ARTIST>Bob Dylan</ARTIST> 
     <COUNTRY>USA</COUNTRY> 
     <COMPANY>Columbia</COMPANY> 
     <PRICE>10.90</PRICE> 
     <YEAR>1985</YEAR> 
    </CD> 
    <CD> 
     <TITLE>Hide your heart</TITLE> 
     <ARTIST>Bonnie Tyler</ARTIST> 
     <COUNTRY>UK</COUNTRY> 
     <COMPANY>CBS Records</COMPANY> 
     <PRICE>9.90</PRICE> 
     <YEAR>1988</YEAR> 
    </CD> 
</CATALOG> 

回答

2

一種方式來重寫你的XML解析代碼如下。在這個例子中,我定義了一個循環遍歷根元素的所有CD元素的生成器(我不檢查這是否爲CATALOG元素,儘管您可以添加該元素)。該生成器將每個CD元素的所有子元素作爲字典返回。

使用發電機比建造所有CD元素的字典更有效,特別是如果你的XML文件是非常大的,因爲你永遠只存儲單個CD元素在內存中。

import xml.etree.ElementTree as etree 

def get_cd(element): 
    try: 
     for el in element.iter(tag='CD') 
      yield get_cd_info(el) 
    except AttributeError: 
     # Python < 2.7 
     for el in element.getiterator(tag='CD') 
      yield get_cd_info(el) 

def get_cd_info(element): 
    return {'title':element.findtext('TITLE'), 
     'artist':element.findtext('ARTIST'), 
     'country':element.findtext('COUNTRY'), 
     'company':element.findtext('COMPANY'), 
     'price':element.findtext('PRICE), 
     'year':element.findtext('YEAR')} 

以下是在行動的上述方法:

s = '''<CATALOG> 
    <CD> 
     <TITLE>Empire Burlesque</TITLE> 
     <ARTIST>Bob Dylan</ARTIST> 
     <COUNTRY>USA</COUNTRY> 
     <COMPANY>Columbia</COMPANY> 
     <PRICE>10.90</PRICE> 
     <YEAR>1985</YEAR> 
    </CD> 
    <CD> 
     <TITLE>Hide your heart</TITLE> 
     <ARTIST>Bonnie Tyler</ARTIST> 
     <COUNTRY>UK</COUNTRY> 
     <COMPANY>CBS Records</COMPANY> 
     <PRICE>9.90</PRICE> 
     <YEAR>1988</YEAR> 
    </CD> 
</CATALOG> 
''' 

e = etree.fromstring(s) 

for cd in get_cd(e): 
    if cd['country'] == 'USA': 
     print('The cd "{0}" is from the USA.'.format(cd['title'])) 

# prints 'The cd "Empire Burlesque" is from the USA.' 
1

未經測試:

.... 
for CD in tree.findall('cd'): 
    for node in CD.finditer(): 
     print node.tag # TITLE, ARTIST, PRICE etc. 

.....