在java中讀取XML標記，代碼優化

我實際上做的是一個遞歸函數，它讀取xml中的標記。以下是代碼：在java中讀取XML標記，代碼優化

private void readTag(org.w3c.dom.Node item, String histoTags, String fileName, Hashtable<String, String> tagsInfos) { 
    try { 
     if (item.getNodeType() == Node.ELEMENT_NODE) { 
      NodeList itemChilds = item.getChildNodes(); 

      for (int i=0; i < itemChilds.getLength(); i++) { 
       org.w3c.dom.Node itemChild = itemChilds.item(i); 
       readTag(itemChild, histoTags + "|" + item.getNodeName(), fileName, tagsInfos); 
      } 
     } 
     else if (item.getNodeType() == Node.TEXT_NODE) { 
      tagsInfosSoft.put(histoTags, item.getNodeValue()); 
     } 
}

該函數需要一些時間才能執行。函數讀取的xml格式如下：

<?xml version="1.0" encoding="UTF-8"?> 
<Document> 
    <Mouvement> 
     <Com> 
      <IdCom>32R01000000772669473</IdCom> 
      <RefCde>32R</RefCde> 
      <Edit>0</Edit> 
     <Com> 
    <Mouvement> 
<Document>

有什麼辦法可以在java中優化這段代碼嗎？

來源

2016-04-29 tabby

您可以使用一些xml對象映射（例如使用xstream）來完成這項工作，**可能**效率更高。也許你可以發佈一個重現「慢」的麥克風？ – 2016-04-29 04:48:41

@RC：你能舉個例子嗎？ – tabby

請參閱http://x-stream.github.io/tutorial.html – 2016-04-29 07:22:05

兩個優化，不知道多少，他們將幫助：

不要使用getChildNodes()。使用getFirstChild()和getNextSibling()。
重新使用一個StringBuilder而不是爲每個元素創建一個新的元素（由histoTags + "|" + item.getNodeName()隱式完成）。

但是，您還應該知道，元素節點的文本內容可能會被看作是多個TEXT和CDATA節點的組合。

如果您的代碼適用於元素而不是節點，那麼您的代碼也可以更好地工作。

private static void readTag(Element elem, StringBuilder histoTags, String fileName, Hashtable<String, String> tagsInfos) { 
    int histoLen = histoTags.length(); 
    CharSequence textContent = null; 
    boolean hasChildElement = false; 
    for (Node child = elem.getFirstChild(); child != null; child = child.getNextSibling()) { 
     switch (child.getNodeType()) { 
      case Node.ELEMENT_NODE: 
       histoTags.append('|').append(child.getNodeName()); 
       readTag((Element)child, histoTags, fileName, tagsInfos); 
       histoTags.setLength(histoLen); 
       hasChildElement = true; 
       break; 
      case Node.TEXT_NODE: 
      case Node.CDATA_SECTION_NODE: 
       //uncomment to test: System.out.println(histoTags + ": \"" + child.getTextContent() + "\""); 
       if (textContent == null) 
        // Optimization: Don't copy to a StringBuilder if only one text node will be found 
        textContent = child.getTextContent(); 
       else if (textContent instanceof StringBuilder) 
        // Ok, now we need a StringBuilder to collect text from multiple nodes 
        ((StringBuilder)textContent).append(child.getTextContent()); 
       else 
        // And we keep collecting text from multiple nodes 
        textContent = new StringBuilder(textContent).append(child.getTextContent()); 
       break; 
      default: 
       // ignore all others 
     } 
    } 
    if (textContent != null) { 
     String text = textContent.toString(); 
     // Suppress pure whitespace content on elements with child elements, i.e. structural whitespace 
     if (! hasChildElement || ! text.trim().isEmpty()) 
      tagsInfos.put(histoTags.toString(), text); 
    } 
}

測試

String xml = "<root>\n" + 
      " <tag>hello <![CDATA[world]]> Foo <!-- comment --> Bar</tag>\n" + 
      "</root>\n"; 
Element docElem = DocumentBuilderFactory.newInstance() 
             .newDocumentBuilder() 
             .parse(new InputSource(new StringReader(xml))) 
             .getDocumentElement(); 
Hashtable<String, String> tagsInfos = new Hashtable<>(); 
readTag(docElem, new StringBuilder(docElem.getNodeName()), "fileName", tagsInfos); 
System.out.println(tagsInfos);

輸出（帶打印取消註釋）

root: " 
    " 
root|tag: "hello " 
root|tag: "world" 
root|tag: " Foo " 
root|tag: " Bar" 
root: " 
" 
{root|tag=hello world Foo Bar}

瞭解如何分割使用CDATA和意見<tag>節點內的文本引起DOM節點包含多個TEXT/CDATA子節點。

來源

2016-04-29 05:29:53 Andreas

在java中讀取XML標記，代碼優化

回答

相關問題