這只是UTF-8 data。使用.decode
將其轉換爲unicode
。
>>> 'D\xc3\xa9cor'.decode('utf-8')
u'D\xe9cor'
您可以執行的'D\\xc3\\xa9cor'
情況下額外的字符串逃生解碼。
>>> 'D\xc3\xa9cor'.decode('string-escape').decode('utf-8')
u'D\xe9cor'
>>> 'D\\xc3\\xa9cor'.decode('string-escape').decode('utf-8')
u'D\xe9cor'
>>> u'D\\xc3\\xa9cor'.decode('string-escape').decode('utf-8')
u'D\xe9cor'
爲了處理第二個情況下,你需要檢測,如果輸入的是unicode
,並將其轉換成首先str
。
>>> def conv(s):
... if isinstance(s, unicode):
... s = s.encode('iso-8859-1')
... return s.decode('string-escape').decode('utf-8')
...
>>> map(conv, [u'D\\xc3\\xa9cor', u'D\xc3\xa9cor', 'D\\xc3\\xa9cor', 'D\xc3\xa9cor'])
[u'D\xe9cor', u'D\xe9cor', u'D\xe9cor', u'D\xe9cor']
有沒有古怪的編碼,只有古怪的程序員。 – 2010-06-07 10:42:42