我正在使用JAXP生成並解析從數據庫中加載了一些字段的XML文檔。使用Java和UTF-8編碼生成有效的XML
代碼序列化XML:
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("test");
root.setAttribute("version", text);
doc.appendChild(root);
DOMSource domSource = new DOMSource(doc);
TransformerFactory tFactory = TransformerFactory.newInstance();
FileWriter out = new FileWriter("test.xml");
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(domSource, new StreamResult(out));
代碼來解析XML:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("test.xml");
我會遇到以下異常:
[Fatal Error] test.xml:1:4: Invalid byte 1 of 1-byte UTF-8 sequence.
Exception in thread "main" org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at com.test.Test.xml(Test.java:27)
at com.test.Test.main(Test.java:55)
的字符串文本包括U元音變音和o變音(字符代碼0xFC和0xF6)。這些是導致錯誤的字符。當我自己逃脫字符串使用ü和ö那麼問題就會消失。當我寫出XML時,其他實體會自動編碼。
如何在不替代這些字符的情況下正確書寫/讀取輸出?
(我讀過以下問題已:
How to encode characters from Oracle to XML?
Repairing wrong encoding in XML files)
不錯,也很容易,我確實想過改成這個,但放棄了這個想法,因爲我沒有看到在構造函數中指定編碼的方法。它工作得很好,謝謝。 – 2009-01-14 15:31:04
我用FileWriter一次在腳下開槍自殺...... +1 – 2009-01-14 16:03:47