2010-07-02 66 views
15

我有這個程序,檢查網站,我想知道我可以通過在Python代理檢查...如何通過代理在Python中使用urllib打開網站?

這是代碼,只是舉例

while True: 
    try: 
     h = urllib.urlopen(website) 
     break 
    except: 
     print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...' 
     time.sleep(5) 
+0

的urllib2 http://stackoverflow.com/questions/1450132/proxy-with-urllib2 – 2015-12-28 11:16:43

回答

29

默認情況下, urlopen使用環境變量http_proxy,以確定要使用的HTTP代理服務器:

$ export http_proxy='http://myproxy.example.com:1234' 
$ python myscript.py # Using http://myproxy.example.com:1234 as a proxy 

如果你不是要指定應用程序中的代理,你可以給一個proxies AR gument到urlopen

proxies = {'http': 'http://myproxy.example.com:1234'} 
print "Using HTTP proxy %s" % proxies['http'] 
urllib.urlopen("http://www.google.com", proxies=proxies) 

編輯:如果我正確理解你的意見,你想嘗試幾個代理,並打印每個代理爲你試試吧。這樣的事情呢?

candidate_proxies = ['http://proxy1.example.com:1234', 
        'http://proxy2.example.com:1234', 
        'http://proxy3.example.com:1234'] 
for proxy in candidate_proxies: 
    print "Trying HTTP proxy %s" % proxy 
    try: 
     result = urllib.urlopen("http://www.google.com", proxies={'http': proxy}) 
     print "Got URL using proxy %s" % proxy 
     break 
    except: 
     print "Trying next proxy in 5 seconds" 
     time.sleep(5) 
+0

使用你的榜樣,我怎麼能打印的內容代理它在時間的urlopen發生的使用? – Shady 2010-07-02 18:36:38

+0

@Shady:只需輸入一個'print'語句,即打印'proxies ['http']'的值。看看我的更新示例,看看它是如何完成的。 – 2010-07-02 18:40:50

+0

好,謝謝,但如果我想要更多的代理一樣,它噸,例如10個代理,下一個 – Shady 2010-07-02 18:48:26

0

下面的示例代碼指導如何使用urllib的通過代理服務器連接:

authinfo = urllib.request.HTTPBasicAuthHandler() 

proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"}) 

# build a new opener that adds authentication and caching FTP handlers 
opener = urllib.request.build_opener(proxy_support, authinfo, 
            urllib.request.CacheFTPHandler) 

# install it 
urllib.request.install_opener(opener) 

f = urllib.request.urlopen('http://www.google.com/') 
""" 
15

Python 3的是這裏略有不同。它會嘗試自動檢測代理設置,但如果您需要特定或手動代理設置,想想這樣的代碼:

#!/usr/bin/env python3 
import urllib.request 

proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:[email protected]:port', 
              'https': 'https://...'}) 
opener = urllib.request.build_opener(proxy_support) 
urllib.request.install_opener(opener) 

with urllib.request.urlopen(url) as response: 
    # ... implement things such as 'html = response.read()' 

也可以參考the relevant section in the Python 3 docs

0

HTTP和HTTPS使用:

proxies = {'http':'http://proxy-source-ip:proxy-port', 
      'https':'https://proxy-source-ip:proxy-port'} 

多個代理可以類似地添加

proxies = {'http':'http://proxy1-source-ip:proxy-port', 
      'http':'http://proxy2-source-ip:proxy-port' 
      ... 
      } 

用法

filehandle = urllib.urlopen(external_url , proxies=proxies) 

不要使用任何代理(在網絡中的鏈接的情況下),通過用戶名和密碼

filehandle = urllib.urlopen(external_url, proxies={}) 

使用代理服務器的身份驗證

proxies = {'http':'http://username:[email protected]:proxy-port', 
      'https':'https://username:[email protected]:proxy-port'} 

注意:避免使用特殊字符如用戶名和密碼中的:,@

相關問題