Python utf8編碼問題

我正在研究一個Python應用程序，並且在處理字符串時遇到了一些問題。Python utf8編碼問題

有這個字符串「她不在我的聯盟」（不含引號）。我將它存儲在一個變量中，並試圖將其插入sqlite3數據庫。但是，我得到這個錯誤：

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

所以，我試圖將字符串轉換爲unicode。我想這兩個：

new_str = unicode(old_str) 
new_str = old_str.encode("utf8")

但是這給了我另一個錯誤：

UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 49: unexpected code byte

我困在這裏。我究竟做錯了什麼？

來源

2011-05-24 Shrihari

嘗試'.decode'而不是'.encode'。 – 2011-05-24 19:23:57

你想'old_str.decode（encoding）'，你不需要（事實上，你不能）將其編碼回字節串用於sqlite，sqlite需要unicode。 – 2011-05-24 20:13:12

簡單。你假設它是UTF-8。

>>> print 'She\x92s Out of My League'.decode('cp1252') 
She’s Out of My League

來源

2011-05-24 18:58:01

那麼，cp1252會和所有人一起工作嗎？我在這裏處理文件名。 Windows和Unix上的文件名。 – Shrihari 2011-05-24 19:01:41

CP1252將使用CP1252編碼文本。 – 2011-05-24 19:02:36

雅，我明白了。我想要一些文件名中允許的所有字符。我選擇哪一個？ – Shrihari 2011-05-24 19:03:58

Python utf8編碼問題

回答

相關問題