這很有趣..我試圖從openstreetmap讀取地理查找數據。執行查詢的代碼如下所示unicode解碼問題
params = urllib.urlencode({'q': ",".join([e for e in full_address]), 'format': "json", "addressdetails" : "1"})
query = "http://nominatim.openstreetmap.org/search?%s" % params
print query
time.sleep(5)
response = json.loads(unicode(urllib.urlopen(query).read(), "UTF-8"), encoding="UTF-8")
print response
蘇黎世的查詢在UTF-8數據上正確地進行了URL編碼。這裏沒有奇蹟。
http://nominatim.openstreetmap.org/search?q=Z%C3%BCrich%2CSWITZERLAND&addressdetails=1&format=json
當我打印響應,變音符號的u被編碼LATIN1(0xFC有)
[{u'display_name': u'Z\xfcrich, Bezirk Z\xfcrich, Z\xfcrich, Schweiz, Europe', u'place_id': 588094, u'lon': 8.540443
但是這是無義因爲OpenStreetMap的以UTF-8
Connecting to nominatim.openstreetmap.org (nominatim.openstreetmap.org)|128.40.168.106|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 26 Jan 2011 13:48:33 GMT
Server: Apache/2.2.14 (Ubuntu)
Content-Location: search.php
Vary: negotiate
TCN: choice
X-Powered-By: PHP/5.3.2-1ubuntu4.7
Access-Control-Allow-Origin: *
Content-Length: 3342
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/json; charset=UTF-8
Length: 3342 (3.3K) [application/json]
返回JSON數據這也可以通過文件內容得到證實,然後我明確地說在讀取和json解析時都是UTF-8。
這是怎麼回事?
編輯:顯然它是json.loads以某種方式搞砸了。
@etarion但它說'UTF-8(十六進制)0xC3 0xBC`在表中。它不應該在UTF-8內容中表現爲這樣嗎?如果我沒有弄錯,如果我將字符串`oxFC`用作UTF-8字符串中的字符,它將是一個無效字符。 – 2011-01-26 13:57:03