在Python中發送非ASCII POST請求？

我想發送一個POST請求到一個web應用程序。我正在使用機械化模塊（本身是urllib2的包裝）。無論如何，當我嘗試發送POST請求時，我得到了UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)。我試圖把unicode(string)的unicode(string, encoding="utf-8")，unicode(string).encode()等，沒有什麼工作 - 無論是返回的錯誤之上，或TypeError: decoding Unicode is not supported在Python中發送非ASCII POST請求？

我看了看其他的SO回答類似的問題，但沒有幫助。

在此先感謝！

編輯：其所產生的錯誤：

prda = "šđćč" #valid UTF-8 characters 
prda # typing in python shell 
'\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d' 
print prda # in shell 
šđćč 
prda.encode("utf-8") #in shell 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128) 
unicode(prda) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)

來源

2012-01-07 Bo Milanovich

如果您展示了一個產生錯誤的小型自包含示例，我會提供幫助。 – ekhumoro 2012-01-07 23:46:01

@ekhumoro補充的例子，希望它清除它 – 2012-01-08 00:37:42

我假設你正在使用Python 2.x版本

給出一個Unicode對象：使用UTF-8

myUnicode = u'\u4f60\u597d'

對其進行編碼：

mystr = myUnicode.encode('utf-8')

請注意，您需要明確指定編碼。默認情況下它會（通常）使用ascii。

來源

2012-01-07 23:52:57

感謝您的答覆。如果我有一個字符串變量（而不是字符串），我會如何將它轉換爲一個unicode對象？它深深嵌入代碼中，以便在分配字符串變量之前簡單地添加'u''前綴。 – 2012-01-08 00:30:10

你不需要來包裝你在字符來電unicode，因爲他們已經編碼:)如果有的話，你需要DE -code它得到一個Unicode對象：

>>> s = '\xc5\xa1\xc4\x91\xc4\x87\xc4\x8d' # your string 
>>> s.decode('utf-8') 
u'\u0161\u0111\u0107\u010d' 
>>> type(s.decode('utf-8')) 
<type 'unicode'>

我不知道mechanize，所以我不知道它是否正確處理它，恐怕。

我會用常規urllib2 POST調用做什麼，是使用urlencode：

>>> from urllib import urlencode 
>>> postData = urlencode({'test': s }) # note I'm NOT decoding it 
>>> postData 
'test=%C5%A1%C4%91%C4%87%C4%8D' 
>>> urllib2.urlopen(url, postData) # etc etc etc

來源

2012-01-08 01:23:16

在你的榜樣，您使用非Unicode字符串常量，其中包含非ASCII字符，這會導致prda成爲字節字符串。

爲了達到這個目的，python使用sys.stdin.encoding來自動編碼字符串。在你的情況下，這意味着字符串被編碼爲「utf-8」。

要prda轉換爲的unicode對象，則需要使用適當的編碼進行解碼：

>>> print prda.decode('utf-8') 
šđćč

需要注意的是，在腳本或模塊，可以不依賴於蟒蛇自動猜測編碼 - 你需要明確delare編碼在文件的頂部，就像這樣：

# -*- coding: utf-8 -*-

每當你在Python 2碰到unicode的錯誤，這是非常往往是因爲你的代碼將字節字符串與unicode字符串混合。因此，您應該始終使用type(string)來檢查導致錯誤的字符串類型。

如果字符串對象是<type 'str'>，但你需要unicode，解碼它使用適當的編碼。如果字符串對象是<type 'unicode'>，但你需要字節，編碼它使用適當的編碼。

來源

2012-01-08 01:24:09 ekhumoro

在Python中發送非ASCII POST請求？

回答

相關問題