9

我試圖使用urllib2下載使用基本身份驗證保護的頁面。我使用的是Python 2.7,但我也在另一臺使用Python 2.5的計算機上嘗試過,並且遇到了完全相同的行爲。我跟着this guide儘可能靠近我可以在這裏給出的例子是我公司生產的代碼:HTTP基本身份驗證似乎不適用於python中的urllib2

import urllib2 

passman = urllib2.HTTPPasswordMgrWithDefaultRealm() 
passman.add_password(None, "http://authenticationsite.com/', "protected", "password") 
authhandler = urllib2.HTTPBasicAuthHandler(passman) 
opener = urllib2.build_opener(authhandler) 

f = opener.open("http://authenticationsite.com/content.html") 
print f.read() 
f.close() 

不幸的是,服務器不是我的,所以我不能分享的細節;我把它們換成了上面和下面。當我運行它,我得到以下回溯:

File 
"/usr/lib/python2.7/urllib2.py", line 
397, in open 
response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", 
line 510, in http_response 
'http', request, response, code, msg, hdrs) File 
"/usr/lib/python2.7/urllib2.py", line 
435, in error 
return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", 
line 369, in _call_chain 
result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 
518, in http_error_default 
raise HTTPError(req.get_full_url(), code, 
msg, hdrs, fp) urllib2.HTTPError: HTTP 
Error 401: Authorization Required 

現在,有趣的是,當我使用的ngrep監控計算機上的TCP流量:

ngrep host 74.125.224.49 interface: 
wlan0 (192.168.1.0/255.255.255.0) 
filter: (ip) and (host 74.125.224.49 
) 
#### T 192.168.1.74:34366 -74.125.224.49:80 [AP] GET /content.html 
HTTP/1.1..Accept-Encoding: 
identity..Host: 
authenticationsite.com..Connection: 
close..User-Agent: 
Python-urllib/2.7.... 

## T 74.125.224.49:80 -192.168.1.74:34366 [AP] HTTP/1.1 401 Authorization Required..Date: Sun, 27 
Feb 2011 03:39:31 GMT..Server: 
Apache/2.2.3 (Red 
Hat)..WWW-Authenticate: Digest 
realm="protected", 
nonce="6NSgTzudBAA=ac585d1f7ae0632c4b90324aff5e39e0f1fc25 
05", algorithm=MD5, 
qop="auth"..Content-Length: 
486..Connection: close..Content-Type: text/html; 
charset=iso-8859-1....<!DOCTYPE HTML 
PUBLIC "-//IETF//DTD HTML 
2.0//EN">.<html><head>.<title>401 Authorization 
Required</title>.</head><body>.<h1>Authorization 
Required</h1>.<p>This server could not 
verify that you.are authorized to 
access the document.requested. Either 
you supplied the wrong.credentials 
(e.g., badpassword), or 
your.browser doesn't understand how to 
supply.the credentials 
required.</p>.<hr>.<address>Apache/2.2.3 
(Red Hat) Server at 
authenticationsite.com Port 
80</address>.</body></html>. 

#### 

看來好像的urllib2拋出該在得到初始401錯誤後甚至沒有嘗試提供憑證。

對於水平的研究,這裏的ngrep的輸出,當我做認證在Web瀏覽器而不是:

ngrep host 74.125.224.49 interface: 
wlan0 (192.168.1.0/255.255.255.0) 
filter: (ip) and (host 74.125.224.49 
) 
#### T 192.168.1.74:36102 -74.125.224.49:80 [AP] GET /content.html HTTP/1.1..Host: 
authenticationsite.com..User-Agent: 
Mozilla/5.0 (X11; U; Linux i686; 
en-US; rv:1.9.2.12) Gecko/20101027 
Firefox/3.6.12..Accept: text 
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language: 
en-us,en;q=0.5..Accept-Encoding: 
gzip,deflate..Accept-Charset: 
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive: 
115..Connection: keep- alive.... 
## T 74.125.224.49:80 -192.168.1.74:36102 [AP] HTTP/1.1 401 Authorization Required..Date: Sun, 27 
Feb 2011 03:43:42 GMT..Server: 
Apache/2.2.3 (Red 
Hat)..WWW-Authenticate: Digest 
realm="protected", 
nonce="rKCfXjudBAA=0c1111321169e30f689520321dbcce37a1876b 
be", algorithm=MD5, 
qop="auth"..Content-Length: 
486..Connection: close..Content-Type: text/html; 
charset=iso-8859-1....<!DOCTYPE HTML 
PUBLIC "-//IETF//DTD HTML 
2.0//EN">.<html><head>.<title>401 Authorization 
Required</title>.</head><body>.<h1>Authorization 
Required</h1>.<p>This server could not 
verify that you.are authorized to 
access the document.requested. Either 
you supplied the wrong.credentials 
(e.g., badpassword), or 
your.browser doesn't understand how to 
supply.the credentials 
required.</p>.<hr>.<address>Apache/2.2.3 
(Red Hat) Server at 
authenticationsite.com Port 
80</address>.</body></html>. 

######### T 192.168.1.74:36103 -74.125.224.49:80 [AP] GET /content.html HTTP/1.1..Host: 
authenticationsite.com..User-Agent: 
Mozilla/5.0 (X11; U; Linux i686; 
en-US; rv:1.9.2.12) Gecko/20101027 
Firefox/3.6.12..Accept: text 
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language: 
en-us,en;q=0.5..Accept-Encoding: 
gzip,deflate..Accept-Charset: 
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive: 
115..Connection: keep- alive..Authorization: Digest 
username="protected", 
realm="protected", 
nonce="rKCfXjudBAA=0c1111199162342689520550dbcce37a1876bbe", 
uri="/content.html", algorithm= MD5, 
response="3b65dadaa00e1d6a1892ffff49f9f325", 
qop=auth, nc=00000001, 
cnonce="7636125b7fde3d1b".... 

## 

再接着與網站的內容。

我一直在玩這一段時間,我無法弄清楚我做錯了什麼。如果有人能幫助我,我會非常感激!

回答

9

我認爲這本引起:

WWW-Authenticate: Digest 

它出現的資源進行驗證摘要,而不是基本的。這意味着你應該使用urllib2.HTTPDigestAuthHandler來代替。

的代碼可能是

import urllib2 

passman = urllib2.HTTPPasswordMgrWithDefaultRealm() 
passman.add_password(None, "http://authenticationsite.com/", "protected", "password") 

# use HTTPDigestAuthHandler instead here 
authhandler = urllib2.HTTPDigestAuthHandler(passman) 
opener = urllib2.build_opener(authhandler) 

res = opener.open("http://authenticationsite.com/content.html") 
print res.read() 
res.close() 
+0

謝謝,你完全正確!我非常感謝你的幫助! – foob 2011-02-27 05:12:05

+0

我在從我的python腳本中提取網站的URL時遇到問題,該腳本將提取包含pdf的所有網站。我正在開發一個代理服務器,在我第一次打開瀏覽器時要求輸入用戶名和密碼。我可以使用瀏覽器查看網站並從網站下載pdf。但是,我無法通過Python中的代碼來完成它。 我面對的錯誤是:「urllib.error.HTTPError:HTTP錯誤401:授權要求」 我得到的錯誤:「AbstractDigestAuthHandler不支持以下方案:'協商'」我錯過了什麼嗎? – Bonson 2015-10-06 03:52:00

-1
import urllib2 
# Create an OpenerDirector with support for Basic HTTP Authentication... 
auth_handler = urllib2.HTTPBasicAuthHandler() 
auth_handler.add_password(realm='PDQ Application', 
          uri='https://mahler:8092/site-updates.py', 
          user='klem', 
          passwd='kadidd!ehopper') 
opener = urllib2.build_opener(auth_handler) 
# ...and install it globally so it can be used with urlopen. 
urllib2.install_opener(opener) 
urllib2.urlopen('http://www.example.com/login.html') 

- http://docs.python.org/library/urllib2.html#examples

+0

這實際上是我已經在做的。正如Victor Lin在其他答案中指出的那樣,問題在於服務器實際上使用摘要式身份驗證而不是基本身份驗證。 – foob 2011-02-27 05:14:48

0

,你必須使用Python NTLM模塊是:

從NTLM進口HTTPNtlmAuthHandler

進口的urllib2

用戶= 「Your_username」

password =「your_Passwrd」

帕斯曼= urllib2.HTTPPasswordMgrWithDefaultRealm()

passman.add_password(無, 「http://your_Home_location/」,用戶,密碼)

auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(帕斯曼)

開罐器= urllib2.build_opener (auth_NTLM)

urllib2。install_opener(開啓器)

URL = 「http://Your_home_location/sub_locations

響應= urllib2.urlopen(URL)

頭= response.info()

打印( 「頭:{}」。格式(頭))

體= response.read()

打印( 「響應:」 +體)