如何將帶有Unicode字符的字符串轉換爲普通字符串？

我有來自服務器的值爲「％u0419％u043E」的傳入字符串。

我嘗試將其轉換爲正常字符串，但我看到中文字母。這是錯誤的，因爲來信是西里爾文。

代碼：

// String test = ""%u0419%u043E"; <--- this is Йо (cyrillic) 
byte[] test = { (byte) 0x25, (byte) 0x75, (byte)0x30, (byte)0x34, (byte)0x31, (byte) 0x39,(byte) 0x25, (byte) 0x75, (byte)0x30, (byte)0x34, (byte)0x33, (byte) 0x45}; 
String aaa = new String(test, "UTF-16"); 
aaa = new String(test, "UTF-8"); 
aaa = new String(test, "ISO-8859-5");

圖像解釋我做什麼：

來源

2016-03-04 DQuade

據我知道這是不是一個標準的編碼，至少不是UTF-之一*或ISO- *。

您需要自己解碼，例如

public static String decode(String encoded) { 
    // "%u" followed by 4 hex digits, capture the digits 
    Pattern p = Pattern.compile("%u([0-9a-f]{4})", Pattern.CASE_INSENSITIVE); 

    Matcher m = p.matcher(encoded); 
    StringBuffer decoded = new StringBuffer(encoded.length()); 

    // replace every occurrences (and copy the parts between) 
    while (m.find()) { 
     m.appendReplacement(decoded, Character.toString((char)Integer.parseInt(m.group(1), 16))); 
    } 

    m.appendTail(decoded); 
    return decoded.toString(); 
}

這給：

System.out.println(decode("%u0419%u043E")); 
Йо

來源

2016-03-04 10:33:07 bwt

如何充分，如果傳入的字符串containt％20更換？示例：「％u0419％u043E％20」 – DQuade

'％XX'似乎是標準的URL編碼，因此您可以使用'java.net.URLDecoder.decode（someString，「UTF-8」）'。如果字符串同時包含'％uXXXX'和'％XX'，則必須首先進行自定義解碼（它會使URL編碼的字符不變） – bwt

是的，我在回答中使用與上面相同的方法：首次解碼後，我使用新模式**％（[0-f] {2}）**和新的匹配器。問題已關閉 – DQuade

如何將帶有Unicode字符的字符串轉換爲普通字符串？

回答

相關問題