transformer.setOutputProperty（OutputKeys.ENCODING，「UTF-8」）不工作

我有以下的方法寫一個XMLDOM到流：transformer.setOutputProperty（OutputKeys.ENCODING，「UTF-8」）不工作

public void writeToOutputStream(Document fDoc, OutputStream out) throws Exception { 
    fDoc.setXmlStandalone(true); 
    DOMSource docSource = new DOMSource(fDoc); 
    Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
    transformer.setOutputProperty(OutputKeys.METHOD, "xml"); 
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
    transformer.setOutputProperty(OutputKeys.INDENT, "no"); 
    transformer.transform(docSource, new StreamResult(out)); 
}

我正在測試一些其他的XML功能，這只是我用來寫入文件的方法。我的測試程序生成了33個測試用例，其中寫出了文件。其中28有以下標題：

<?xml version="1.0" encoding="UTF-8"?>...

但由於某些原因，測試案例1，現在生產：

<?xml version="1.0" encoding="ISO-8859-1"?>...

點和四個其他農產品：

<?xml version="1.0" encoding="Windows-1252"?>...

，你可以清楚地看到，我正在將ENCODING輸出密鑰設置爲UTF-8。這些測試曾用於早期版本的Java。我有一段時間沒有運行測試（超過一年），但今天在「Java（TM）SE運行時環境（build 1.6.0_22-b04）」上運行。「我得到了這個有趣的行爲。

我已驗證導致問題的文檔是從最初具有這些編碼的文件中讀取的。看來這些庫的新版本正試圖保留讀取的源文件的編碼。但這不是我想要的...我真的希望輸出爲UTF-8。

有誰知道任何可能導致轉換器忽略UTF-8編碼設置的其他因素嗎？是否還有其他必須在文檔上設置，以便忘記最初讀取的文件的編碼？

UPDATE：

我檢查了同一個項目從另一臺機器上，建造和運行測試那裏。在那臺機器上所有的測試都通過了！所有的文件頭都有「UTF-8」。該機器具有「Java（TM）SE運行時環境（內部版本1.6.0_29-b11）」兩臺機器都運行Windows 7.在正常工作的新機器上，使用jdk1.5.0_11來構建版本，但是舊版本機器jdk1.6.0_26用於構建。用於兩個版本的庫完全一樣。它是否可以在構建時與JDK 1.6不兼容？

UPDATE：

後4.5年，Java庫仍然是斷開的，但由於以下Vyrx的建議，我終於有了一個妥善的解決辦法！

public void writeToOutputStream(Document fDoc, OutputStream out) throws Exception { 
    fDoc.setXmlStandalone(true); 
    DOMSource docSource = new DOMSource(fDoc); 
    Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
    transformer.setOutputProperty(OutputKeys.METHOD, "xml"); 
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); 
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); 
    transformer.setOutputProperty(OutputKeys.INDENT, "no"); 
    out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>".getBytes("UTF-8")); 
    transformer.transform(docSource, new StreamResult(out)); 
}

解決方案是禁止寫入標題，並在將XML序列化爲輸出蒸汽之前寫入正確的標題。跛腳，但它會產生正確的結果。 4年前破解的測試現在正在運行！

來源

2013-03-23 AgilePro

這的確看起來像一些錯誤或不兼容的問題。沒有可重複的測試用例，任何人都不可能提供幫助。你能否提供一個[SSCCE]（http://sscce.org/），並列出工具/庫的所有版本？ – sleske 2013-05-19 08:17:21

有幾個地方可以檢查您的語言環境。您的本地計算機具有區域設置，您的IDE可能具有區域設置，並且您的JVM進程具有區域設置。在我的Locale更改之前，我已經看到類似這樣的問題。你如何運行測試？ java.exe，maven，IDE？ – 2013-06-10 11:50:50

由於我已經直接指定了UTF-8，所以語言環境應該沒有問題，但要直接回答您的問題，測試代碼將作爲調用Java.exe的命令行調用位於美國太平洋海岸的Windows系統上並針對美國英語和太平洋時區進行配置。 – AgilePro 2013-06-14 01:30:01

序列化表情符號時，我在Android上遇到同樣的問題。在變換器中使用UTF-8編碼時，輸出是HTML字符實體（UTF-16代理對），隨後會破壞讀取數據的其他解析器。

這是我怎麼會解決它：

StringWriter sw = new StringWriter(); 
sw.write("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>"); 
Transformer t = TransformerFactory.newInstance().newTransformer(); 

// this will work because we are creating a Java string, not writing to an output 
t.setOutputProperty(OutputKeys.ENCODING, "UTF-16"); 
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); 
t.transform(new DOMSource(elementNode), new StreamResult(sw)); 

return IOUtils.toInputStream(sw.toString(), Charset.forName("UTF-8"));

來源

2017-12-06 21:31:31 Vyrx

是的，看起來像是有效的。我不喜歡將我的整個XML樹轉換爲內存中的字符串（特別是在StringWriter效率不高的情況下）。我真的堅持直接輸出到輸出。一種可能的解決方案是在序列化之後添加頭，而不是在沒有頭的序列化XML到相同的輸出流之前將頭寫入輸出流。我會看看這是否有效。 – AgilePro 2017-12-09 17:53:00

我已經重寫了這個想法，正確使用流並給你答案的功勞。（謝謝！）正如你寫的那樣，你將同時在內存中擁有三份文檔副本。對於小型XML而言不成問題，但通常在內存中有三個重要數據文件副本效率不高。更好的方法是在將XML序列化到writer之前簡單地編寫頭文件。我重寫了你的答案，使它在內存中只有2個XML副本。 – AgilePro 2017-12-09 18:16:12

-1

我在這裏拍攝一張照片，但是您提到您正在讀取測試數據的文件。您是否可以確保您使用正確的編碼讀取文件，因此當您向OutputStream中寫入數據時，您已經擁有正確編碼的數據？

因此，有像新的InputStreamReader（新的FileInputStream（fileDir），「UTF8」）的東西。

不要忘記的FileReader的單參數的構造函數總是使用平臺默認編碼：The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate.

來源

2013-09-11 13:41:38 Carlos

我從來沒有使用FileReader。 --- DOM「Document」使用字符串值，這意味着它們已經從原始形式轉換而來。我正在使用Java DOM實用程序直接從字節流中讀取文件。預計該流將根據指定編碼的XML標頭進行解釋。這就是XML的工作原理。 ---該文件似乎被正確讀取，並且以指定的編碼寫入 - 而不是我要求寫入的編碼。 – AgilePro 2013-09-24 23:16:28

要回答這個問題，下面的代碼對我的作品。這可以採用輸入編碼並將數據轉換爲輸出編碼。

 ByteArrayInputStream inStreamXMLElement = new ByteArrayInputStream(strXMLElement.getBytes(input_encoding)); 
     DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
     DocumentBuilder db = dbf.newDocumentBuilder(); 
     Document docRepeat = db.parse(new InputSource(new InputStreamReader(inStreamXMLElement, input_encoding))); 
     Node elementNode = docRepeat.getElementsByTagName(strRepeat).item(0); 

     TransformerFactory tFactory = null; 
     Transformer transformer = null; 
     DOMSource domSourceRepeat = new DOMSource(elementNode); 
     tFactory = TransformerFactory.newInstance(); 
     transformer = tFactory.newTransformer(); 
     transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); 
     transformer.setOutputProperty(OutputKeys.ENCODING, output_encoding); 

     ByteArrayOutputStream bos = new ByteArrayOutputStream(); 
     StreamResult sr = new StreamResult(new OutputStreamWriter(bos, output_encoding)); 


     transformer.transform(domSourceRepeat, sr); 
     byte[] outputBytes = bos.toByteArray(); 
     strRepeatString = new String(outputBytes, output_encoding);

來源

2014-04-16 18:03:49

僅在某些版本的Java中出現此錯誤。我沒有時間對究竟是什麼環境導致問題進行全面調查，甚至沒有時間在這裏發佈測試代碼，但它與您發佈的內容非常相似。失敗的原因是已經運行多年的自動化測試。你所包含的代碼看起來就像是一個如何測試問題的好例子。我不知道我是否能夠回到失敗的原始環境，並在那裏重新運行測試。所有，在時間的充裕... – AgilePro 2014-04-16 19:12:01

-1

嘗試專門設置你的StreamResult編碼：

StreamResult result = new StreamResult(new OutputStreamWriter(out, "UTF-8"));

這樣，它應該只能夠在UTF-8寫出來。

來源

2014-11-04 04:39:00

問題是'頭'是不正確的。如果標題表示它是ISO-8859-1，那麼我不希望它以其他方式實際編碼。我需要標題和流的實際編碼。這就是爲什麼使用這些庫我總是使用輸入/輸出流而不使用讀寫器......因爲標準說你必須讀取頭才能找出編碼是什麼。 – AgilePro 2014-11-04 21:48:44

怎麼樣？：

public static String documentToString(Document doc) throws Exception{ return(documentToString(doc,"UTF-8")); }// 
    public static String documentToString(Document doc, String encoding) throws Exception{ 
    TransformerFactory transformerFactory =TransformerFactory.newInstance(); 
    Transformer transformer = null; 

if ("".equals(validateNullString(encoding))) encoding = "UTF-8"; 
try{ 
    transformer = transformerFactory.newTransformer(); 
    transformer.setOutputProperty(OutputKeys.INDENT, "yes") ; 
    transformer.setOutputProperty(OutputKeys.ENCODING, encoding) ; 
}catch (javax.xml.transform.TransformerConfigurationException error){ 
    return null; 
} 

Source source = new DOMSource(doc);  
StringWriter writer = new StringWriter(); 
Result result = new StreamResult(writer); 

try{ 
    transformer.transform(source,result); 
}catch (javax.xml.transform.TransformerException error){ 
    return null; 
} 
return writer.toString();  
}//documentToString

來源

2014-11-27 11:29:47

我花了很多時間顯著量調試這個問題，因爲這是在我的機器上（Ubuntu的14 +的Java 1.8.0_45）工作良好，但在正常不工作生產（Alpine Linux + Java 1.7）。

與我的預期相反，從上面提到的答案沒有幫助。

ByteArrayOutputStream bos = new ByteArrayOutputStream(); 
StreamResult sr = new StreamResult(new OutputStreamWriter(bos, "UTF-8"));

但是這一個通過包裝傳遞給DOMSource的構造Document對象和預期一樣

val out = new StringWriter() 
val result = new StreamResult(out)

來源

2015-10-23 13:40:22 expert

我可以解決該問題。我的包裝器的getXmlEncoding方法總是返回null，所有其他方法都委託給包裝好的Document對象。

來源

2016-07-06 20:12:46

transformer.setOutputProperty（OutputKeys.ENCODING，「UTF-8」）不工作

回答

相關問題