2013-04-24 67 views
0

我正在使用castor將對象轉換爲XML的API。Castor Marshaling ::無效的XML字符

我得到下面的異常

產生的原因:org.xml.sax.SAXException:字符 '' 是無效的XML字符。

我知道正確的方法是糾正源,但有很多這樣的無效字符。

在另一個論壇中,有人建議在編組java之前對java對象內容進行編碼,然後對輸出進行解碼(Base64)。該方法看起來非常麻煩,並且不適合該解決方案。

我需要一種方法來在編組過程中跳過這些字符,並且生成的XML應該包含字符原樣。

+0

挖掘了一下之後,我發現無效字符只不過是一個退格符(ASCII碼= 8)。奇怪的是,退格字符是如何插入字符串的。 有什麼建議嗎? – Taran 2013-04-24 14:20:35

+0

在編組它們之前對java對象內容進行編碼,並在解組之後進行解碼。這似乎是解決這個問題的唯一方法。 marshal.setEncoding( 「BASE64」);使用base 64編碼和解碼。 – constantlearner 2013-04-25 21:25:15

+0

我不認爲使用base64是合適的,因爲這不是二進制數據。下面的答案確實有幫助。謝謝。 – Taran 2013-04-26 11:56:51

回答

0
/** 
    * This method ensures that the output String has only 
    * valid XML unicode characters as specified by the 
    * XML 1.0 standard. For reference, please see 
    * <a href="http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char">the 
    * standard</a>. This method will return an empty 
    * String if the input is null or empty. 
    * 
    * @param in The String whose non-valid characters we want to remove. 
    * @return The in String, stripped of non-valid characters. 
    */ 
    public String stripNonValidXMLCharacters(String in) { 
     StringBuffer out = new StringBuffer(); // Used to hold the output. 
     char current; // Used to reference the current character. 

     if (in == null || ("".equals(in))) return ""; // vacancy test. 
     for (int i = 0; i < in.length(); i++) { 
      current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen. 
      if ((current == 0x9) || 
       (current == 0xA) || 
       (current == 0xD) || 
       ((current >= 0x20) && (current <= 0xD7FF)) || 
       ((current >= 0xE000) && (current <= 0xFFFD)) || 
       ((current >= 0x10000) && (current <= 0x10FFFF))) 
       out.append(current); 
     } 
     return out.toString(); 
    } 
0

如果你想生成的XML遏制這種

字符,因爲它是

,那麼XML 1.1規範可能的幫助。 蓖麻可以被配置爲編組到XML 1.1用定製org.exolab.castor.xml.XMLSerializerFactoryorg.exolab.castor.xml.Serializer實現:

package com.foo.castor; 
...... 

import org.exolab.castor.xml.BaseXercesOutputFormat; 
import org.exolab.castor.xml.Serializer; 
import org.exolab.castor.xml.XMLSerializerFactory; 
import org.xml.sax.DocumentHandler; 

import com.sun.org.apache.xml.internal.serialize.OutputFormat; 
import com.sun.org.apache.xml.internal.serialize.XML11Serializer; 

@SuppressWarnings("deprecation") 
public class CastorXml11SerializerFactory implements XMLSerializerFactory { 

    private static class CastorXml11OutputFormat extends BaseXercesOutputFormat{ 

     public CastorXml11OutputFormat(){ 
      super._outputFormat = new OutputFormat(); 
     } 
    } 

    private static class CastorXml11Serializer implements Serializer { 

     private XML11Serializer serializer = new XML11Serializer(); 

     @Override 
     public void setOutputCharStream(Writer out) { 
      serializer.setOutputCharStream(out); 
     } 

     @Override 
     public DocumentHandler asDocumentHandler() throws IOException { 
      return serializer.asDocumentHandler(); 
     } 

     @Override 
     public void setOutputFormat(org.exolab.castor.xml.OutputFormat format) { 
      serializer.setOutputFormat((OutputFormat)format.getFormat()); 
     } 

     @Override 
     public void setOutputByteStream(OutputStream output) { 
      serializer.setOutputByteStream(output); 
     } 

    } 

    @Override 
    public Serializer getSerializer() { 
     return new CastorXml11Serializer(); 
    } 

    @Override 
    public org.exolab.castor.xml.OutputFormat getOutputFormat() { 
     return new CastorXml11OutputFormat(); 
    } 

} 

castor.properties文件全局

org.exolab.castor.xml.serializer.factory=com.foo.castor.CastorXml11SerializerFactory 
org.exolab.castor.xml.version=1.1 

或通過您的特定CastorMarshallersetCastorProperties方法設置這兩個屬性。

但是,請注意XML 1.1 is not accepted by browsersnot all XML parsers can parse XML 1.1 out of the box