2017-04-02 204 views
0

我在學習beautifulsoup,並試圖編寫一個小腳本在荷蘭房地產網站上查找房屋。當我試圖讓網站的內容,我立即得到一個HTTP405錯誤:使用urllib獲取網站會導致HTTP 405錯誤

File "funda.py", line 2, in <module> 
    html = urlopen("http://www.funda.nl") 
    File "<folders>request.py", line 223, in urlopen 
    return opener.open(url, data, timeout) 
    File "<folders>request.py", line 532, in open 
    response = meth(req, response) 
    File "<folders>request.py", line 642, in http_response 
    'http', request, response, code, msg, hdrs) 
    File "<folders>request.py", line 570, in error 
    return self._call_chain(*args) 
    File "<folders>request.py", line 504, in _call_chain 
    result = func(*args) 
    File "<folders>request.py", line 650, in http_error_default 
    raise HTTPError(req.full_url, code, msg, hdrs, fp) 
urllib.error.HTTPError: HTTP Error 405: Not Allowed 

什麼我嘗試執行:

from urllib.request import urlopen 
html = urlopen("http://www.funda.nl") 

知道爲什麼這是導致HTTP405?我只是在做一個GET請求,對吧?

import urllib 
html = urllib.urlopen("http://www.funda.nl") 

leovp的評論是有道理的:

+2

這絕對是一個GET請求,但你被檢測爲一個機器人,而這個特定的服務器在這種情況下發送405錯誤代碼。嘗試將標題調整爲正常瀏覽器。 – leovp

+0

相關 - https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit?noredirect=1&lq=1 –

回答

2

可能的重複HTTPError: HTTP Error 403: Forbidden。你需要假的你是一個普通的訪問者。這通常是通過使用通用/常規User-Agent HTTP標頭完成的(因站點而異)。

>>> url = "http://www.funda.nl" 
>>> import urllib.request 
>>> req = urllib.request.Request(
...  url, 
...  data=None, 
...  headers={ 
...   'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' 
...  } 
...) 
>>> f = urllib.request.urlopen(req) 
>>> f.status, f.msg 
(200, 'OK') 

使用requests庫 -

>>> import requests 
>>> response = requests.get(
...  url, 
...  headers={ 
...   'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36' 
...  } 
...) 
>>> response.status_code 
200 
-2

它,如果你不使用的要求或工作的urllib2。