轉換中國ASCII字符串到中國語言串

我試圖用sys模塊設置默認編碼字符串轉換，但它不工作。轉換中國ASCII字符串到中國語言串

的字符串是：

`\xd2\xe6\xc3\xf1\xba\xcb\xd0\xc4\xd4\xf6\xb3\xa4\xbb\xec\xba\xcf`

這意味着在中國益民核心增長混合。但是，我怎樣才能把它轉換成中文字符串？

我嘗試這樣：

>>> string = '\xd2\xe6\xc3\xf1\xba\xcb\xd0\xc4\xd4\xf6\xb3\xa4\xbb\xec\xba\xcf' 
>>> print string.decode("gbk") 
益民核心增長混合 # As you can see here, got the right answer 
>>> new_str = string.decode("gbk") 
>>> new_str 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408' # It returns the another encode type. 
>>> another = u"益民核心增長混合" 
>>> another 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408' # same as new_str

所以，我只是對這種情況，困惑，爲什麼我可以打印string.decode("gbk")但new_str在我的Python控制檯剛剛返回另一個編碼類型？

我的操作系統是Windows 10，我的Python版本是Python 2.7版。非常感謝你！

來源

2016-03-03 Alexander Yau

你做正確。

在這種情況下，new_str實際上是一個unicode字符串，如u前綴所示。

>>> new_str 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408' # It returns the another encode type.

當您解碼GBK編碼的字符串時，會得到一個unicode字符串。該字符串的每個字符都是一個unicode代碼點，例如，

>>> u'\u76ca' 
u'\u76ca' 
>>> print u'\u76ca' 
益 
>>> import unicodedata 
>>> unicodedata.name(u'\u76ca') 
'CJK UNIFIED IDEOGRAPH-76CA' 

>>> print new_str 
益民核心增長混合 
>>> print repr(new_str) 
u'\u76ca\u6c11\u6838\u5fc3\u589e\u957f\u6df7\u5408

這是的Python如何顯示在解釋器中的Unicode字符串 - 它是使用repr來顯示它。但是，當您打印字符串時，Python會轉換爲終端的編碼（sys.stdout.encoding），這就是字符串按照您的預期顯示的原因。

所以，這不是一個字符串的不同編碼，它只是顯示的Python在解釋該字符串的方式。

來源

2016-03-03 05:03:37 mhawke

轉換中國ASCII字符串到中國語言串

回答

相關問題