2012-04-23 83 views
1

您好!我想通過python腳本訪問一些網頁。的網址是:http://www.idealo.de/preisvergleich/Shop/27039.html無法通過wget e或腳本訪問網址

當我通過網絡瀏覽器訪問它是好的。但是,當我想用​​的urllib2訪問:

a = urllib2.urlopen("http://www.idealo.de/preisvergleich/Shop/27039.html") 

它給了我下面的錯誤:

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen 
    return _opener.open(url, data, timeout) 
    File "/usr/lib/python2.7/urllib2.py", line 406, in open 
    response = meth(req, response) 
    File "/usr/lib/python2.7/urllib2.py", line 519, in http_response 
    'http', request, response, code, msg, hdrs) 
    File "/usr/lib/python2.7/urllib2.py", line 444, in error 
    return self._call_chain(*args) 
    File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain 
    result = func(*args) 
    File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default 
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) 
urllib2.HTTPError: HTTP Error 403: Forbidden 

我也試圖與wget來訪問它:

wget http://www.idealo.de/preisvergleich/Shop/27039.html 

的錯誤是:

--2012-04-23 12:42:03-- http://www.idealo.de/preisvergleich/Shop/27039.html 
Resolving www.idealo.de (www.idealo.de)... 62.146.49.133 
Connecting to www.idealo.de (www.idealo.de)|62.146.49.133|:80... connected. 
HTTP request sent, awaiting response... 403 Forbidden 
2012-04-23 12:42:03 ERROR 403: Forbidden. 

任何人都可以解釋它爲什麼如此?我如何使用python訪問它?

回答

5

他們阻止了一些用戶代理。如果您嘗試以下操作:

wget -U "Mozilla/5.0" http://www.idealo.de/preisvergleich/Shop/27039.html 

它的工作原理。所以你必須找到方法在你的python代碼中僞造用戶代理來使其工作。

試試這個:

import urllib2 
opener = urllib2.build_opener() 
opener.addheaders = [('User-agent', 'Mozilla/5.0')] 
a = opener.open("http://www.idealo.de/preisvergleich/Shop/27039.html") 
+0

從urllib2的文檔的最後一個例子:http://docs.python.org/library/urllib2.html會有所幫助 – 2012-04-23 10:55:47