在Python中使用unicode（）和encode（）函數

我有一個路徑變量的編碼問題，並將其插入到SQLite數據庫中。我試圖用編碼（「utf-8」）函數解決它，但沒有幫助。然後我使用unicode（）函數，它給我類型unicode。在Python中使用unicode（）和encode（）函數

print type(path)     # <type 'unicode'> 
path = path.replace("one", "two") # <type 'str'> 
path = path.encode("utf-8")  # <type 'str'> strange 
path = unicode(path)    # <type 'unicode'>

最後我獲得了的unicode類型，但我仍然有其存在時路徑變量的類型是海峽同樣的錯誤

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

你能幫我解決這個錯誤並解釋encode("utf-8")和unicode()函數的正確用法？我經常與之戰鬥。

編輯：

這的execute（）陳述引發的錯誤：

cur.execute("update docs set path = :fullFilePath where path = :path", locals())

我忘了改fullFilePath變量，同樣的問題受到影響的編碼，但我現在很困惑。我應該只使用unicode（）或編碼（「utf-8」）或兩者嗎？

我不能使用

fullFilePath = unicode(fullFilePath.encode("utf-8"))

，因爲它提出了這樣的錯誤：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 32: ordinal not in range(128)

的Python版本是2.7.2

來源

2012-04-23 xralf

哪裏是引發錯誤的代碼？ – newtover 2012-04-23 20:48:04

您確切的問題已經得到解答：[http://stackoverflow.com/questions/2392732/sqlite-python-unicode-and-non-utf-data][1] [1]：http：// stackoverflow.com/questions/2392732/sqlite-python-unicode-and-non-utf-data – garnertb 2012-04-23 20:51:25

@newtover我編輯了這個問題。 – xralf 2012-04-23 20:55:58

您正在使用encode("utf-8")不正確。 Python字節字符串（str類型）具有編碼，Unicode不具有。您可以使用uni.encode(encoding)將Unicode字符串轉換爲Python字節字符串，並且可以使用s.decode(encoding)（或等效地unicode(s, encoding)）將字節字符串轉換爲Unicode字符串。

如果fullFilePath和path目前是str類型，您應該弄清楚它們是如何編碼的。例如，如果當前的編碼是UTF-8，你可以使用：

path = path.decode('utf-8') 
fullFilePath = fullFilePath.decode('utf-8')

如果仍不能解決問題，實際問題可能是你不使用你的電話Unicode字符串，嘗試將其更改爲以下內容：

cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())

來源

2012-04-23 21:15:32

此語句'fullFilePath = fullFilePath.decode（「utf-8」）'仍然會產生錯誤'UnicodeEncodeError：'ascii'編解碼器無法對位置32-34中的字符進行編碼：序號不在範圍內（128）。 fullFilePath是類型* str *和從db表的* text *列取得的應該是utf-8編碼的字符串的組合。 – xralf 2012-04-23 21:25:40

根據[this]（http://www.sqlite.org/datatype3.html），但可以是UTF-8，UTF-16BE或UTF-16LE。我能以某種方式找出它嗎？ – xralf 2012-04-23 21:31:27

@xralf，如果組合不同的'str'對象，則可能是混合編碼。你可以顯示'print repr（fullFilePath）'的結果嗎？ – 2012-04-23 21:34:40

str是文本表示字節， unicode是以文字表示的字符。

您將文本從字節解碼爲unicode，並使用某種編碼將unicode編碼爲字節。

即：

>>> 'abc'.decode('utf-8') # str to unicode 
u'abc' 
>>> u'abc'.encode('utf-8') # unicode to str 
'abc'

來源

2012-04-23 21:08:53 newtover

謝謝。非常有用的信息。 – Rupam 2016-06-13 15:05:08

非常乾淨的答案，非常聰明謝謝 – 2017-07-18 14:41:47

非常好的答案，直接點。我會補充一點，'unicode'說的是字母或符號，或者更一般地說：** runes **，而'str'代表某個編碼中的字節串，那麼您必須「解碼」（顯然是在正確的編碼中）獲得特定的符文 – arainone 2017-08-28 14:44:05

確保在從shell運行腳本之前設置好了語言環境設置，例如

$ locale -a | grep "^en_.\+UTF-8" 
en_GB.UTF-8 
en_US.UTF-8 
$ export LC_ALL=en_GB.UTF-8 
$ export LANG=en_GB.UTF-8

文檔：man locale，man setlocale。

來源

2017-09-26 11:56:15 kenorb

在Python中使用unicode（）和encode（）函數

回答

相關問題