2012-08-05 74 views
13
# -*- coding: utf-8 -*- 
# Python3 
import urllib 
import urllib.request as url_req 
opener = url_req.build_opener() 
url='http://zh.wikipedia.org/wiki/'+"毛澤東" 
opener.open(url).read() 
# opener.open(url.encode("utf-8")).read() 
# # doesn't work either 

當我運行它,它抱怨說:如何在python3中處理URL中的unicode字符串?

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-12: ordinal not in range(128)

但我不能用.encode()無論是作爲它會抱怨:

Traceback (most recent call last): 
    File "t.py", line 8, in <module> 
    opener.open(url.encode("utf-8")).read() 
    File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/urllib/request.py", line 360, in open 
    req.timeout = timeout 
AttributeError: 'bytes' object has no attribute 'timeout' 

任何人都知道如何處理與?

+1

應該使用urllib.quote()正確引用URL參數, – 2012-08-05 17:08:45

回答

19

你可以使用urllib.parse.quote()來編碼URL的路徑部分。

#!/usr/bin/env python3 
from urllib.parse import quote 
from urllib.request import urlopen 

url = 'http://zh.wikipedia.org/wiki/' + quote("毛澤東") 
content = urlopen(url).read() 
11

的奇妙requests庫爲您完成此開箱:

>>> url='http://zh.wikipedia.org/wiki/'+"毛澤東" 
>>> import requests 
>>> r = requests.get(url) 
>>> len(r.content) 
818747 
相關問題