你也許應該將# -*- coding: utf-8 -*-
和使用編輯器和其他一切在UTF-8模式無論如何要避免這些問題,但如果你想找出哪些編碼最適合您的電流輸入,你可以試試這個腳本(更換'some string'
與更多的東西本地化):
encodings = ['ascii', 'cp037', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig']
def test(s):
for enc in encodings:
try:
u = unicode(s, enc)
print u, enc
except: pass
test('some string')
話雖這麼說,UTF-8是你的朋友;用它。 :)
m是從這樣的somehing M = file.readlines()...爲m在M:...我怎麼能說在這裏:m =u'Šiven'? – Kristian 2012-03-03 14:06:29
你不需要那裏的'readlines',你可以迭代文件(這會減少內存需求)。你應該真的諮詢[其他](http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python)[問題](http://stackoverflow.com/問題/ 147741 /從python中讀取文件),或者,如果這些問題(以及您搜索的問題)和他們的最佳答案無法解決您的問題,請自己提出一個新問題。總之,使用['codecs.open'](http://docs.python.org/library/codecs.html#codecs.open)。 – phihag 2012-03-03 14:12:58