2017-04-21 71 views
0

我想從網站導入源代碼,但試圖導入的其中一個字符的問題是î,它導致錯誤。導入HTML代碼的unicode錯誤

這裏是我的代碼:

import urllib.request 
htmlfile = urllib.request.urlopen("url...") 
htmltext=htmlfile.read() 
print(htmltext) 

以下是錯誤:

Traceback (most recent call last): 
File "/Users/****/Documents/Scraping.py", line 3, in <module> 
htmlfile = urllib.request.urlopen("http://*****") 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen 
return opener.open(url, data, timeout) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open 
response = self._open(req, data) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open 
'_open', req) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain 
result = func(*args) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1346, in http_open 
return self.do_open(http.client.HTTPConnection, req) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open 
encode_chunked=req.has_header('Transfer-encoding')) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request 
self._send_request(method, url, body, headers, encode_chunked) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1250, in _send_request 
self.putrequest(method, url, **skips) 
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1117, in putrequest 
self._output(request.encode('ascii')) 
UnicodeEncodeError: 'ascii' codec can't encode character '\xee' in position 32: ordinal not in range(128) 

而且出現這種情況:

"http://...url...".encode('ascii') 
Traceback (most recent call last): 
File "<pyshell#11>", line 1, in <module> 
"http://www....url...".encode('ascii') 
UnicodeEncodeError: 'ascii' codec can't encode character '\xee' in position 50: ordinal not in range(128) 
+1

你能發佈完整的堆棧跟蹤嗎?哪條線有錯誤? – tdelaney

+0

已在帖子中更新 –

+0

這似乎是發送GET請求的問題。網址本身有什麼奇怪的地方?你可以嘗試手動編碼你的url''url ...'。encode('ascii')'看看它是否會爆炸。 – tdelaney

回答

0

你嘗試添加到解碼的urlopen? add (...).urlopen(URL).decode('utf-8')

+0

在給出的示例中沒有必要這樣做。 OP只是打印字節對象。一般來說可能不需要--xml解析器應該自己編碼。 – tdelaney

+0

是的,仍然發生相同的錯誤 –

+0

但如果我認爲正確,你需要解碼字符串本身。所以也許在urlopen()。Decode('utf-8')或read()。Decode('utf-8') –