2010-06-20 42 views
1

我試圖打開一個文件,我只是意識到,py有麻煩與我的用戶名(這是在俄羅斯)。有關如何正確解碼/編碼這個讓閒暇快樂的建議?處理多語言目錄(Python)

我用py 2.6.5

xmlfile = open(u"D:\\Users\\Эрик\\Downloads\\temp.xml", "r") 

Traceback (most recent call last): 
    File "<pyshell#23>", line 1, in <module> 
    xmlfile = open(str(u"D:\\Users\\Эрик\\Downloads\\temp.xml"), "r") 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordinal not in range(128) 

os.sys.getfilesystemencoding() 'MBCS'

XMLFILE =開放(U「d:\用戶\Эрик\下載\ TEMP .XML 「.encode( 」MBCS「), 」R「)

回溯(最近通話最後一個): 文件 」「,1號線,在 XMLFILE =開放(U」 d:\用戶\Эрик\下載\ temp.xml「.encode(」mbcs「),」r「) IOError:[Errno 22] inval ID模式('r')或文件名:'D:\ Users \ Y?ee \ Downloads \ temp.xml'

回答

0

第一個問題是解析器嘗試解釋字符串中的反斜槓,除非使用r"raw quote"前綴。在2.6.5,你不需要特殊對待你的Unicode字符串,但是你可能需要一個文件編碼聲明在你的源代碼,如:

# -*- coding: utf-8 -*- 

PEP 263定義。下面是它的交互工作的例子:

$ python 
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 
>>> f = r"D:\Users\Эрик\Downloads\temp.xml" 
>>> f 
'D:\\Users\\\xd0\xad\xd1\x80\xd0\xb8\xd0\xba\\Downloads\\temp.xml' 
>>> x = open(f, 'w') 
>>> x.close() 
>>> 
$ ls D* 
D:\Users\Эрик\Downloads\temp.xml 

是的,這是一個Unix系統上,因此將\是沒有意義的,我的終端編碼是UTF-8,但它的作品。讀取文件時,您可能必須將解碼器的編碼提示提供給解析器。

0

第一個問題:

xmlfile = open(u"D:\\Users\\Эрик\\Downloads\\temp.xml", "r") 
### The above line should be OK, provided that you have the correct coding line 
### For example # coding: cp1251 

Traceback (most recent call last): 
    File "<pyshell#23>", line 1, in <module> 
    xmlfile = open(str(u"D:\\Users\\Эрик\\Downloads\\temp.xml"), "r") 
### HOWEVER the above traceback line shows you actually using str() 
### which is DIRECTLY causing the error because it is attempting 
### to decode your filename using the default ASCII codec -- DON'T DO THAT. 
### Please copy/paste; don't type from memory. 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordinal not in range(128) 

問題二:

os.sys.getfilesystemencoding()硬編碼的文件名產生'mbcs'

xmlfile = open(u"D:\Users\Эрик\Downloads\temp.xml".encode("mbcs"), "r") 
### (a) \t is interpreted as a TAB character, hence the file name is invalid. 
### (b) encoding with mbcs seems not to be useful; it messes up your name ("Y?ee"). 

Traceback (most recent call last): 
File "", line 1, in xmlfile = open(u"D:\Users\Эрик\Downloads\temp.xml".encode("mbcs"), "r") 
IOError: [Errno 22] invalid mode ('r') or filename: 'D:\Users\Y?ee\Downloads\temp.xml' 

一般建議在Windows,按優先順序:

(1)不要
(2)使用例如/"c:/temp.xml"
(3)使用原始字符串與反斜槓r"c:\temp.xml"
(4)使用了一倍反斜槓"c:\\temp.xml"