Python - Python 3.1似乎無法處理UTF-16編碼文件？

我試圖運行一些代碼來簡單地瀏覽一堆文件，並將那些碰巧是.txt文件的文件寫入同一個文件，刪除所有空格。下面是一些簡單的代碼是應該做的伎倆：Python - Python 3.1似乎無法處理UTF-16編碼文件？

for subdir, dirs, files in os.walk(rootdir): 
for file in files: 
    if '.txt' in file: 
     f = open(subdir+'/'+file, 'r') 
     line = f.readline() 
     while line: 
      line2 = line.split() 
      if line2: 
       output_file.write(" ".join(line2)+'\n') 
      line = f.readline() 
     f.close()

但是，相反，我得到以下錯誤：

文件「/usr/lib/python3.1/codecs.py」，線300，在解碼（結果，消耗）= self._buffer_decode（數據，self.errors，最終） UnicodeDecodeError錯誤： 'UTF8' 編解碼器不能在位置0解碼字節0xFE的：意外的代碼字節

原來這些.txt文件全部採用UTF-16格式（無論如何，都依據FireFox）。我以爲Python 3.x應該能夠處理任何類型的字符編碼？

最佳，喬治娜

來源

2011-04-13 Georgina

的建議，你可以告訴Python的這些文件是UTF-16？ – Gabe 2011-04-13 05:35:11

我該怎麼做？ – Georgina 2011-04-13 05:37:04

OK，oneliner：'output_file.write（input_file.read（）。decode（'utf-16'）。replace（u「」，u「」）。encode（'desired encoding'））' – janislaw 2011-04-13 10:38:54

使用open(bla, 'r', encoding="utf-16")。

來源

2011-04-13 05:37:40 filmor

Woops - thanks ！就像你發佈這個，iI發現了這個偉大的帖子：http://stackoverflow.com/questions/3140010/converting-from-utf-16-to-utf-8-in-python-3 – Georgina 2011-04-13 05:42:19

完美的答案。謝謝。 – 2013-10-24 00:58:15

有各種utf-16編碼。

UTF-16是大端沒有BOM
UTF-16樂小端沒有BOM
UTF-16小端+ BOM

示例：

Python 3.2 (r32:88452, Feb 20 2011, 11:12:31) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> a = 'a'.encode('utf-16') 
>>> a 
b'\xff\xfea\x00' 
>>> a.decode('utf-16') 
'a' 
>>> a = 'a'.encode('utf-16-le') 
>>> a 
b'a\x00' 
>>> a.decode('utf-16-le') 
'a' 
>>> a = 'a'.encode('utf-16-be') 
>>> a 
b'\x00a' 
>>> a.decode('utf-16-be') 
'a'

您可以使用這些編碼通過@filmor's answer

來源

2011-04-13 06:03:37 kevpie

Python - Python 3.1似乎無法處理UTF-16編碼文件？

回答

相關問題