2012-02-18 52 views
0

我已經構建了一個腳本(通過互聯網資源的幫助),該腳本從特定網站獲取可用代理的列表,然後逐個檢查以找到工作代理。一旦它發現它從該代理構建並開啓。這是我的代碼。使用代理

import urllib2 
import urllib 
import cookielib 
import socket 
import time 

def getOpener(pip=None): 
    if pip: 
     proxy_handler = urllib2.ProxyHandler({'http': pip}) 
     opener = urllib2.build_opener(proxy_handler) 
    else: 
     opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar())) 
    opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1')] 
    urllib2.install_opener(opener) 
    return opener 

def getContent(opnr, url): 
    req = urllib2.Request(url) 
    sock = opnr.open(req) 
    return sock.read() 

def is_bad_proxy(pip): 
    try: 
     opnr = getOpener(pip) 
     data = getContent(opnr, 'http://www.google.com') 
    except urllib2.HTTPError, e: 
     return e.code 
    except Exception, detail: 
     return True 
    return False 

def getProxiesList(): 
    proxies = [] 
    opnr = getOpener() 
    content = getContent(opnr, 'http://somesite.com/') 
    urls = re.findall("<a href='([^']+)'[^>]*>.*?HTTP Proxies.*?</a>", content) 
    for eachURL in urls: 
     content = getContent(opnr, eachURL) 
     proxies.extend(re.findall('\d{,3}\.\d{,3}\.\d{,3}\.\d{,3}:\d+', content)) 
    return proxies 

def getWorkingProxy(proxyList, i=-1): 
    for j in range(i+1, len(proxyList)): 
     currentProxy = proxyList[j] 
     if not is_bad_proxy(currentProxy): 
      log("%s is working" % (currentProxy)) 
      return currentProxy, j 
     else: 
      log("Bad Proxy %s" % (currentProxy)) 
    return None, -1 

if __name__ == "__main__": 
    socket.setdefaulttimeout(60) 
    proxyList = getProxiesList() 
    proxy, index = getWorkingProxy(proxyList) 
    if proxy: 
     _web = getOpener(proxy) 

當我在一定程度上利用一個代理時,我不得不一再重複這個過程。問題是does building an opener again and again will cause issues??因爲我有以下錯誤HTTPError: HTTP Error 503: Too many open connections。請幫助我什麼是錯誤的原因?提前致謝。

+0

也許您的列表包含多次相同的代理,並且它有一個連接限制? – cha0site 2012-02-18 12:01:05

+0

可能是原因,我檢查和proxyList包含重複。 – 2012-02-18 12:10:17

回答

0

我檢查了,proxyList包含重複項。很多openers試圖使用導致錯誤的相同代理HTTPError: HTTP Error 503: Too many open connections