從Python文本文件創建xml樹

我需要避免在解析文本文件時在xml樹中創建雙分支。比方說，文本文件如下（行的順序是隨機的）：從Python文本文件創建xml樹

BRANCH1：branch11：消息11
BRANCH1：branch12：message12
BRANCH2：branch21：message21
BRANCH2：branch22：message22

所以得到的xml樹應該有一個有兩個分支的根。這兩個分支都有兩個子分支。我用它來解析這個文本文件的Python代碼如下：

import string 
fh = open ('xmlbasic.txt', 'r') 
allLines = fh.readlines() 
fh.close() 
import xml.etree.ElementTree as ET 
root = ET.Element('root') 

for line in allLines: 
    tempv = line.split(':') 
    branch1 = ET.SubElement(root, tempv[0]) 
    branch2 = ET.SubElement(branch1, tempv[1]) 
    branch2.text = tempv[2] 

tree = ET.ElementTree(root) 
tree.write('xmlbasictree.xml')

這段代碼的問題是，在XML樹的一個分支與來自文本文件的每一行創建。

任何建議如何避免在xml樹中創建另一個分支如果具有此名稱的分支已經存在？

來源

2010-09-21 bitman

with open("xmlbasic.txt") as lines_file: 
    lines = lines_file.read() 

import xml.etree.ElementTree as ET 

root = ET.Element('root') 

for line in lines: 
    head, subhead, tail = line.split(":") 

    head_branch = root.find(head) 
    if not head_branch: 
     head_branch = ET.SubElement(root, head) 

    subhead_branch = head_branch.find(subhead) 
    if not subhead_branch: 
     subhead_branch = ET.SubElement(branch1, subhead) 

    subhead_branch.text = tail 

tree = ET.ElementTree(root) 
ET.dump(tree)

的邏輯很簡單 - 你已經提到它在你的問題！在創建樹之前，您只需檢查樹中是否已存在樹枝。

請注意，這可能是低效的，因爲您正在搜索每一行的整個樹。這是因爲ElementTree不是爲了唯一而設計的。

如果您需要的速度（你可能沒有，尤其是對於短小的樹！），更有效的方法是使用一個defaultdict將其轉換爲ElementTree之前樹形結構存儲。

import collections 
import xml.etree.ElementTree as ET 

with open("xmlbasic.txt") as lines_file: 
    lines = lines_file.read() 

root_dict = collections.defaultdict(dict) 
for line in lines: 
    head, subhead, tail = line.split(":") 
    root_dict[head][subhead] = tail 

root = ET.Element('root') 
for head, branch in root_dict.items(): 
    head_element = ET.SubElement(root, head) 
    for subhead, tail in branch.items(): 
     ET.SubElement(head_element,subhead).text = tail 

tree = ET.ElementTree(root) 
ET.dump(tree)

來源

2010-09-21 10:30:40 katrielalex

謝謝，這個和其他答案都很好，但我會堅持defaultdict，因爲實際上文本和xml文件相當大。 – bitman 2010-09-21 11:54:26

沿着這些線？你保持分支的水平在字典中重用。

b1map = {} 

for line in allLines: 
    tempv = line.split(':') 
    branch1 = b1map.get(tempv[0]) 
    if branch1 is None: 
     branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0]) 
    branch2 = ET.SubElement(branch1, tempv[1]) 
    branch2.text = tempv[2]

來源

2010-09-21 10:13:07 piro

從Python文本文件創建xml樹

回答

相關問題