2012-08-01 75 views
0
解析只有CDATA

快速的問題:與SAXParser的

如果我有像這樣的一個XML:

<?xml version="1.0" encoding="utf-8"?> 
<cop xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cop.xsd"> 
    <auth> 
     <uid mioattributo="20">16<![CDATA[ 
function matchwo(a,b) ]]></uid> 
    </auth> 
</cop> 

所以uid有兩個孩子,對不對? Node.CDATA_SECTION_NODE之一和Node.TEXT_NODE之一。

實現這種快速的類(擴展平常的DefaultHandler):

public class MyHandler extends DefaultHandler { 
    /** 
    * Logger for this class 
    */ 
    private static final Log log = LogFactory.getLog(MyHandler.class); 
    private StringBuilder sb; 

    @Override 
    public void startElement(String uri, String localName, String qName, 
     Attributes attributes) throws SAXException { 
    System.out.println("STARTUri: " + uri); 
    System.out.println("STARTLocalName: " + localName); 
    System.out.println("STARTqName: " + qName); 
// for(int i=0;i<attributes.getLength();i++) { 
//  System.out.println("LocalName: "+attributes.getLocalName(i)); 
//  System.out.println("Type: "+attributes.getType(i)); 
//  System.out.println("qName: "+attributes.getQName(i)); 
//  System.out.println("URI: "+attributes.getURI(i)); 
//  System.out.println("Value: "+attributes.getValue(i)); 
// } 
    sb = new StringBuilder(); 
    //super.startElement(uri, localName, qName, attributes); 
    } 

    @Override 
    public void characters(char[] ch, int start, int length) 
      throws SAXException { 
    sb.append(ch, start, length); 
    System.out.println("TEMPORARY: " + sb.toString()); 
    System.out.println(); 
    } 

    @Override 
    public void endElement(String uri, String localName, String qName) 
      throws SAXException { 
    System.out.println("ENDUri: " + uri); 
    System.out.println("ENDLocalName: " + localName); 
    System.out.println("ENDqName: " + qName); 
    System.out.println("Content: " + sb.toString()); 
    sb.replace(0, sb.length()-1,""); 
    } 

} 

輸出的解析會是這樣的:

Is Validating: true 
STARTUri: 
STARTLocalName: cop 
STARTqName: cop 
TEMPORARY: 


STARTUri: 
STARTLocalName: auth 
STARTqName: auth 
TEMPORARY: 


STARTUri: 
STARTLocalName: uid 
STARTqName: uid 
TEMPORARY: 16 

TEMPORARY: 16 
function matchwo(a,b) 

ENDUri: 
ENDLocalName: uid 
ENDqName: uid 
Content: 16 
function matchwo(a,b) 
TEMPORARY: 


ENDUri: 
ENDLocalName: auth 
ENDqName: auth 
Content: 

TEMPORARY: 


ENDUri: 
ENDLocalName: cop 
ENDqName: cop 
Content:  

從輸出可以看出,該方法characters()是在節點uid內部調用兩次,以識別兩個孩子。有沒有辦法知道哪一個是CDATA,哪一個是TEXT?

回答

3

你應該看看LexicalHandler它告訴你關於CDATA開始/結束。

注意,SAX解析器是在自由(在endElement()你只知道是完整的被稱爲)打電話給你characters()方法儘可能多(或儘可能少)倍,因爲它需要,以便爲您打造了一個字符串,並且你不能依靠它來確定文檔結構。

+0

嘿布萊恩。所以基本上LexicalHandler擴展了DefaultHandler2擴展了DefaultHandler,所以我會同時擁有節點的內容和節點的類型,對嗎? – dierre 2012-08-01 16:32:46