2013-02-14 88 views
11

我想合併多個XML文件一起使用Python和沒有外部庫。 XML文件具有嵌套元素。合併xml文件與嵌套元素沒有外部庫

示例文件1:

<root> 
    <element1>textA</element1> 
    <elements> 
    <nested1>text now</nested1> 
    </elements> 
</root> 

示例文件2:

​​

我想:

<root> 
    <element1>textA</element1>  
    <element2>textB</element2> 
    <elements> 
    <nested1>text after</nested1> 
    <nested2>new text</nested2> 
    </elements> 
</root> 

我試過的東西:

this answer

from xml.etree import ElementTree as et 
def combine_xml(files): 
    first = None 
    for filename in files: 
     data = et.parse(filename).getroot() 
     if first is None: 
      first = data 
     else: 
      first.extend(data) 
    if first is not None: 
     return et.tostring(first) 

我會得到什麼:

<root> 
    <element1>textA</element1> 
    <elements> 
    <nested1>text now</nested1> 
    </elements> 
    <element2>textB</element2> 
    <elements> 
    <nested1>text after</nested1> 
    <nested2>new text</nested2> 
    </elements> 
</root> 

我希望你能看到並理解我的問題。我正在尋找一個適當的解決方案,任何指導都會很棒。

爲了澄清問題,使用我現有的解決方案,嵌套元素不合並。

回答

18

您發佈的代碼是將所有元素組合在一起,而不管具有相同標籤的元素是否已存在。因此,您需要迭代元素並手動檢查並按照您認爲合適的方式進行組合,因爲它不是處理XML文件的標準方式。我不能解釋它比代碼更好,所以在這裏,它或多或少地被評論:

from xml.etree import ElementTree as et 

class XMLCombiner(object): 
    def __init__(self, filenames): 
     assert len(filenames) > 0, 'No filenames!' 
     # save all the roots, in order, to be processed later 
     self.roots = [et.parse(f).getroot() for f in filenames] 

    def combine(self): 
     for r in self.roots[1:]: 
      # combine each element with the first one, and update that 
      self.combine_element(self.roots[0], r) 
     # return the string representation 
     return et.tostring(self.roots[0]) 

    def combine_element(self, one, other): 
     """ 
     This function recursively updates either the text or the children 
     of an element if another element is found in `one`, or adds it 
     from `other` if not found. 
     """ 
     # Create a mapping from tag name to element, as that's what we are fltering with 
     mapping = {el.tag: el for el in one} 
     for el in other: 
      if len(el) == 0: 
       # Not nested 
       try: 
        # Update the text 
        mapping[el.tag].text = el.text 
       except KeyError: 
        # An element with this name is not in the mapping 
        mapping[el.tag] = el 
        # Add it 
        one.append(el) 
      else: 
       try: 
        # Recursively process the element, and update it in the same way 
        self.combine_element(mapping[el.tag], el) 
       except KeyError: 
        # Not in the mapping 
        mapping[el.tag] = el 
        # Just add it 
        one.append(el) 

if __name__ == '__main__': 
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine() 
    print '-'*20 
    print r 
+0

完美的工作,謝謝,我剛開始寫我自己的代碼。 :) – 2013-02-14 16:32:14

+0

很好,謝謝。我們還需要合併屬性。可以通過在替換元素文本後在'combine_element'和'mapping [el.tag] .attrib.update(el.attrib)'開始處添加'one.attrib.update(other.attrib)'來完成。 – 2013-11-04 18:38:55

+0

哦,對了,我忘記了屬性。接得好。 – jadkik94 2013-11-06 20:09:28

2

謝謝,但我的問題是通過考慮屬性也合併。這裏是我的補丁後的代碼:

import sys 
    from xml.etree import ElementTree as et 


    class hashabledict(dict): 
     def __hash__(self): 
      return hash(tuple(sorted(self.items()))) 


    class XMLCombiner(object): 
     def __init__(self, filenames): 
      assert len(filenames) > 0, 'No filenames!' 
      # save all the roots, in order, to be processed later 
      self.roots = [et.parse(f).getroot() for f in filenames] 

    def combine(self): 
     for r in self.roots[1:]: 
      # combine each element with the first one, and update that 
      self.combine_element(self.roots[0], r) 
     # return the string representation 
     return et.ElementTree(self.roots[0]) 

    def combine_element(self, one, other): 
     """ 
     This function recursively updates either the text or the children 
     of an element if another element is found in `one`, or adds it 
     from `other` if not found. 
     """ 
     # Create a mapping from tag name to element, as that's what we are fltering with 
     mapping = {(el.tag, hashabledict(el.attrib)): el for el in one} 
     for el in other: 
      if len(el) == 0: 
       # Not nested 
       try: 
        # Update the text 
        mapping[(el.tag, hashabledict(el.attrib))].text = el.text 
       except KeyError: 
        # An element with this name is not in the mapping 
        mapping[(el.tag, hashabledict(el.attrib))] = el 
        # Add it 
        one.append(el) 
      else: 
       try: 
        # Recursively process the element, and update it in the same way 
        self.combine_element(mapping[(el.tag, hashabledict(el.attrib))], el) 
       except KeyError: 
        # Not in the mapping 
        mapping[(el.tag, hashabledict(el.attrib))] = el 
        # Just add it 
        one.append(el) 

if __name__ == '__main__': 

    r = XMLCombiner(sys.argv[1:-1]).combine() 
    print '-'*20 
    print et.tostring(r.getroot()) 
    r.write(sys.argv[-1], encoding="iso-8859-1", xml_declaration=True)