2014-12-04 82 views
0

我當時正在關注pythonforbeginners.com上的一個教程,並且我遇到了一個在我的OSX上沒有正確運行的代碼。無法使用urllib2從網站中提取數據

from bs4 import BeautifulSoup 
import urllib2 
url = "http://www.pythonforbeginners.com" 
content = urllib2.urlopen(url).read() 
soup = BeautifulSoup(content) 
print soup.prettify() 

這給我的錯誤:

Traceback (most recent call last): File "/Users/dhruvmullick/CS/Python/Extracting Data/test.py", line 8, in content = urllib2.urlopen(url).read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden

回答

0

403 error表示服務器阻止您的連接。

...a request from a client for a web page or resource to indicate that the server can be reached and understood the request, but refuses to take any further action.

嘗試一個不同的域,你會發現它按預期工作。

要做出變通,您可以添加一個custom user-agent

+0

有沒有理由爲什麼這個域名阻止我的連接,而其他人不是? – 2014-12-04 13:53:04

+0

服務器可能會在沒有用戶代理的情況下阻止任何請求。查看底部的鏈接以獲取添加用戶代理的步驟。 – philshem 2014-12-04 14:16:41