2015-03-25 71 views
0

的代碼:(深度優先通常)爲什麼xml.dom.minidom走路移除空文本節點意外地工作?

import xml.dom.minidom as xdom  

def _walk_n_apply(func, cond, parent):                                       
    if parent.childNodes:                                          
     for child in parent.childNodes:                                       
      if cond(child):                                          
       func(parent, child)                                        
       continue                                          
      _walk_n_apply(func, cond, child)                                     

def remove_child(parent, child):                                         
    node = parent.removeChild(child)                                        
    print 'removed', node                                          

def is_empty_text_node(node):                                         
    return node.nodeType == node.TEXT_NODE and node.data.strip() == '' 


xmldom = xdom.parse('blah') 

_walk_n_apply(remove_child, is_empty_text_node, xmldom) 

在IPython中,在調用

_walk_n_apply(remove_child, is_empty_text_node, xmldom) 

一次,有輕微變化在輸出:

print xmldom.toprettyxml() 

但,如果我多次稱它爲「幾個取決於嵌套的級別」,它最終會給出一個很好的格式prettyxml

如何通過一次調用實現此目的?


輸入文件內容:

<grammar xmlns="http://www.w3.org/2001/06/grammar" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://www.w3.org/2001/06/grammar 
          http://www.w3.org/TR/speech-grammar/grammar.xsd" 
     xml:lang="en" version="1.0" 
     root="command" 
     mode="voice" 
     tag-format="semantics/1.0"> 

<rule id="command"> 
    <one-of> 
     <item><ruleref uri="#announcement" /></item> 
     <item><ruleref uri="#hello" /></item> 
     <item><ruleref uri="#whereis" /></item> 
     <item><ruleref uri="#interrupt" /></item> 
     <item><ruleref uri="#message" /></item> 
     <item><ruleref uri="#logon" /></item> 
     <item><ruleref uri="#logoff" /></item> 
     <item><ruleref uri="#storecoverage" /></item> 
     <item><ruleref uri="#identify" /></item> 
     <item><ruleref uri="#near" /></item> 
     <item><ruleref uri="#time" /></item> 
     <item><ruleref uri="#playmessages" /></item> 
     <item><ruleref uri="#registerbackup" /></item> 
     <item><ruleref uri="#igotit" /></item> 
    </one-of> 
    <tag>out=rules.latest()</tag> 
</rule> 

<rule id="announcement"> 
<item> 
    <one-of> 
    <item>announcement today<tag>out="AnnouncementToday"</tag></item> 
    <item>announcement now<tag>out="AnnouncementNow"</tag></item> 
    <item>announcement hour<tag>out="AnnouncementHour"</tag></item> 
    </one-of> 
</item> 
</rule> 

<rule id="hello"> 
    <item repeat="0-1"> 
    <one-of> 
     <item>hello</item> 
     <item>hey</item> 
     <item>hi</item> 
    </one-of> 
    </item> 
    <item><ruleref uri="persons.grxml"/><tag>out="Hello,"+rules.latest()</tag></item> 
</rule> 

<rule id="whereis"> 
    <item> 
    <one-of> 
     <item>where is<ruleref uri="persons.grxml"/></item> 
     <item>locate<ruleref uri="persons.grxml"/></item> 
     <item>find<ruleref uri="persons.grxml"/></item> 
    </one-of> 
    <tag>out="Whereis,"+rules.latest()</tag> 
    </item> 
</rule> 

<rule id="interrupt"> 
    <item>interrupt<ruleref uri="persons.grxml"/><tag>out="Interrupt,"+rules.latest()</tag></item> 
</rule> 

<rule id="message"> 
<item>message</item> 
    <item repeat="0-1">for</item> 
    <item><ruleref uri="persons.grxml"/><tag>out="Message,"+rules.latest()</tag></item> 
</rule> 

<rule id="logon"> 
    <one-of> 
    <item>log on 
     <one-of> 
     <item><ruleref uri="persons.grxml"/><tag>out="Logon,"+rules.latest()</tag></item> 
     <item><ruleref uri="#id_numbers"/><tag>out="Logon,"+rules.latest()</tag></item> 
     </one-of> 
     </item> 
    </one-of> 
</rule> 

<rule id="logoff"> 
<item> 
    <one-of> 
     <item>log off<item repeat='0-1'>system</item></item> 
     <item>log out</item> 
    </one-of> 
    <tag>out="Logoff"</tag> 
</item> 
</rule> 

<rule id="storecoverage"> 
    <item repeat="0-1">store</item> 
    <item>coverage<tag>out="coverage"</tag></item> 
</rule> 

<rule id="identify"> 
    <item>identify<tag>out="identify"</tag></item> 
</rule> 

<rule id="near"> 
    <one-of> 
     <item>who is</item> 
     <item>anyone</item> 
    </one-of> 
    <item>near<ruleref uri="#locations"/><tag>out="near,"+rules.latest()</tag></item> 
</rule> 

<rule id="time"> 
<one-of> 
    <item>time<tag>out="time"</tag></item> 
    <item>what time is it<tag>out="time"</tag></item> 
</one-of> 
</rule> 

<rule id="playmessages"> 
    <item> 
    play 
    <one-of> 
     <item>messages<tag>out="PlayMessages"</tag></item> 
     <item>announcements<tag>out="PlayMessages"</tag></item> 
    </one-of> 
    </item> 
</rule> 

<rule id="registerbackup"> 
    <item repeat="0-1">cash</item> 
    <item>register backup<tag>out="register backup"</tag></item> 
</rule> 

<rule id="igotit"> 
<one-of> 
    <item> 
    <one-of> 
    <item>i got it<tag>out="i got it"</tag></item> 
    <item>i have it<tag>out="i got it"</tag></item> 
    </one-of> 
    </item> 
    <item> 
    <one-of> 
    <item>on the way<tag>out="i got it"</tag></item> 
    <item>on my way<tag>out="i got it"</tag></item> 
    </one-of> 
    </item> 
</one-of> 
</rule> 


<rule id="locations"> 
    <ruleref uri="locations.grxml"/> 
    <tag>out=rules.latest();</tag> 
</rule> 

輸出,如果我調用該函數只有一次:

removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 

輸出,如果我反覆調用該函數的10倍:

像這樣

for i in range(10): 
    _walk_n_apply(remove_child, is_empty_text_node, xmldom) 

(輸出從tmux會話複製粘貼,因此可能會有幾行錯過;我所理解的缺點是,如果我的函數是遞歸和正確的,那麼它應該在一次調用中取消所有的空文本節點。但稱這是第二次也導致一些空文本節點被移除,然後第三次,依此類推,直到......有沒有更多的空文本節點。)

removed <DOM Text node "u'\n\n'">                                      
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '">                                      
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '">                                      
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u' \n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '">                                      
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n'"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n  '"> 
removed <DOM Text node "u'\n '"> 
+0

請提供一個示例輸入以及在該示例輸入上運行代碼的預期和實際結果。 – 2015-03-25 15:17:34

+0

@Robᵩ我用樣本輸入文件和輸出編輯了問題。 – aniketd 2015-03-26 11:15:48

回答

1

你是在迭代.childNodes時操作兒童列表。試試這個:

for child in list(parent.childNodes): 
+0

你真棒! – aniketd 2015-03-26 16:21:15