python3：readlines（）索引問題？

 
Python 3.1.2 (r312:79147, Nov 9 2010, 09:41:54) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] 
Traceback (most recent call last): 
    File "", line 1, in 
    File "/usr/local/lib/python3.1/codecs.py", line 300, in decode 
    (result, consumed) = self._buffer_decode(data, self.errors, final) 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2230: unexpected code byte

可是......python3：readlines（）索引問題？

 
Python 2.4.3 (#1, Sep 8 2010, 11:37:47) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] 
'2010-06-14 21:14:43 613 xxx.xxx.xxx.xxx 200 TCP_NC_MISS 4198 635 GET http www.thelegendssportscomplex.com 80 /thumbnails/t/sponsors/145x138/007.gif - - - DIRECT www.thelegendssportscomplex.com image/gif http://www.thelegendssportscomplex.com/ "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8)" OBSERVED "Sports/Recreation" - xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx\r\n'

沒有任何人有任何想法，爲什麼.readlines（）[6]不爲Python-3工作，但確實在2.4的工作？

也...我想0xAE是®

來源

2010-11-09 MadSc13ntist

這是一個編碼錯誤。索引與它無關（它甚至不需要評估）。 – delnan 2010-11-09 19:24:19

在Python3中使用'open（..，'rb'）'來模擬Python 2.x行爲。 – jfs 2010-11-09 22:34:36

從Python wiki：

的解碼的UnicodeDecodeError從某一個編碼字符串海峽時通常發生。因爲值編碼映射只STR字符串轉換爲Unicode字符的數量有限，的STR字符的非法序列將導致特定的編碼解碼（）失敗

它比你認爲你看起來好像你有不同的編碼做。

來源

2010-11-09 19:23:00 Woot4Moo

很高興知道並感謝您的回覆，但仍不能回答潛在問題。即爲什麼這只是3.x中的問題而不是2.x中的問題？ – MadSc13ntist 2010-11-09 20:26:28

我的猜測是Python 3使用與2不同的編碼方案，因爲它是兩者之間的主要區別 – Woot4Moo 2010-11-09 20:39:15

open功能DOC：使用編碼永遠

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

讀取文件：

open("/home/madsc13ntist/test_file.txt", "r",encoding='iso8859-1').readlines()[6]

忽略解碼錯誤？設置錯誤='忽略'。 '錯誤'的默認值是'None'，和'strict'一樣。

來源

2013-01-07 07:09:12 imxylz

從問這個問題大概兩年，你可能已經知道原因了。基本上，Python 3字符串是Unicode字符串。爲了使它們變爲抽象，你需要告訴Python什麼編碼用於文件。

Python 2字符串實際上是字節序列，Python感覺很好，可以讀取文件中的任何字節。一些字符被解釋（換行符，標籤，...），但其餘部分保持不變。

Python 3 open()與Python 2 codecs.open()類似。

......現在已經到了......接受其中一個答案來解決問題。

來源

2013-01-07 07:25:56 pepr

python3：readlines（）索引問題？

回答

相關問題