我正在從包含法語和英語字母的單詞的文件中讀取數據。我試圖構建一個所有可能的英文和法文字母(存儲爲字符串)的列表。我爲此用下面的代碼:在Python中處理法語字母
# encoding: utf-8
def trackLetter(letters, line):
for a in line:
found = False;
for b in letters:
if b==a:
found = True
if not found:
letters += a
cur_letters = []; # for storing possible letters
data = urllib2.urlopen('https://duolinguist.wordpress.com/2015/01/06/top-5000-words-in-french-wordlist/', 'utf-8')
for line in data:
trackLetter(cur_letters, line)
# works if I print here
print cur_letters
此代碼打印如下:
[ 'T', 'H', 'E', '0', 'F',「一'','','','','','','','','','''',' ,'c','p','g','k','x','j','z','q','\ xc3', '\ xa0','\ xaa',' \'x9','\ xa8','\ xb4','\ xae',' - ','\ xe2',' \ xa7','\ xbb','\ xaf']
顯然F儘管我指定了UTF編碼,但在某種轉換爲ASCII的情況下,rench字母已丟失!奇怪的是,當我直接打印出該行(顯示爲註釋)時,法語字符顯示完美!
我應該怎麼做才能保留這些字符(é, è, ê, etc.
)或將它們轉換回原始版本?
可能重複[Unicode(utf8)讀取和寫入到python文件](http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python) – mx0
不,閱讀filie是不是問題 - 請參閱OP的「如果我在這裏打印的作品」評論 – Greg