java的字符集解碼問題

我試圖解碼使用字符·字符集在java中GB2312java的字符集解碼問題

包含在GB2312這個字符，位置代碼a1a4check here

代碼：

public static void main(String[] _args) throws Exception { 
    String str="a1a4:· a5f6:ヶ a8c5:ㄅ";   
    ByteBuffer bf=readToByteBuffer(new ByteArrayInputStream(str.getBytes())); 
    System.out.println(Charset.forName("GB2312").decode(bf).toString()); 
} 
private static final int bufferSize = 0x20000; 
static ByteBuffer readToByteBuffer(InputStream inStream) throws IOException { 
    byte[] buffer = new byte[bufferSize]; 
    ByteArrayOutputStream outStream = new ByteArrayOutputStream(bufferSize); 
    int read; 
    while (true) { 
     read = inStream.read(buffer); 
     if (read == -1) 
      break; 
     outStream.write(buffer, 0, read); 
    } 
    ByteBuffer byteData = ByteBuffer.wrap(outStream.toByteArray()); 
    return byteData; 
}

以上輸出結果代碼爲：

a1a4:? a5f6:ヶ a8c5:ㄅ

我不明白爲什麼不能解碼a1a4？

來源

2012-02-29 Koerr

我假設'IO.string2InputStream（d）'也使用GB2312字符集寫入。你有沒有檢查緩衝區中的字節是否正確？ – 2012-02-29 00:31:17

@RussellZahniser抱歉，編輯了我的問題。 – Koerr 2012-02-29 00:38:43

您可能想要執行'str.getBytes（「GB2312」）' - 您使用的是默認值，可能是UTF8。但是我認爲，呃關於它是一個人物問題而不是編碼問題是正確的。 – 2012-02-29 00:58:04

在我的瀏覽器中，字符串d的第五個字符編碼爲0xB7，即MIDDLE DOT，而不是KATAKANA MIDDLE DOT。但是，根據您提到的相同數據庫，該代碼點爲is not available in the GB2312 character set。同樣，you can see既沒有MIDDLE DOT也沒有編碼0xB7被列爲GB2312的一部分。

我認爲這裏的問題是在輸入字符串中的字符，而不是在您的JRE提供的CharsetDecoder中。

來源

2012-02-29 00:39:48 seh

java的字符集解碼問題

回答

相關問題