2010-04-14 117 views
2

我試着回答以下問題出於個人興趣: What is the fastest way to send 100,000 HTTP requests in Python?惱人的扭曲的Python問題

這就是我想出了這麼遠,但我遇到了一些非常STANGE。

安裝信號處理器,它只是掛起。我可以看到DelayedCall實例在reactor._newTimedCalls中,但是processResponse永遠不會被調用。

安裝信號處理程序錯誤,它會引發錯誤並工作。

from twisted.internet import reactor 
from twisted.web.client import Agent 
from threading import Semaphore, Thread 
import time 

concurrent = 100 
s = Semaphore(concurrent) 
reactor.suggestThreadPoolSize(concurrent) 
t=Thread(
    target=reactor.run, 
    kwargs={'installSignalHandlers':True}) 
t.daemon=True 
t.start() 


agent = Agent(reactor) 


def processResponse(response,url): 
    print response.code, url 
    s.release() 

def processError(response,url): 
    print "error", url 
    s.release() 

def addTask(url): 
    req = agent.request('HEAD', url) 
    req.addCallback(processResponse, url) 
    req.addErrback(processError, url) 


for url in open('urllist.txt'): 
    addTask(url.strip())  
    s.acquire() 
while s._Semaphore__value!=concurrent: 
    time.sleep(0.1)  

reactor.stop() 

這裏是錯誤,它會拋出時installSignalHandlers爲真: (注:這是預期的行爲,現在的問題是,爲什麼當installSignalHandlers爲False這是行不通的!)。

Traceback (most recent call last): 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 396, in fireEvent 
    DeferredList(beforeResults).addCallback(self._continueFiring) 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 224, in addCallback 
    callbackKeywords=kw) 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 213, in addCallbacks 
    self._runCallbacks() 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks 
    self.result = callback(self.result, *args, **kw) 
--- <exception caught here> --- 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 409, in _continueFiring 
    callable(*args, **kwargs) 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1165, in _reallyStartRunning 
    self._handleSignals() 
    File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1105, in _handleSignals 
    signal.signal(signal.SIGINT, self.sigInt) 
exceptions.ValueError: signal only works in main thread 

我在做什麼錯,什麼是正確的方法?我是新來扭曲。

@moshez: 謝謝。它現在:

from twisted.internet import reactor, threads 
from urlparse import urlparse 
import httplib 
import itertools 


concurrent = 100 
finished=itertools.count(1) 
reactor.suggestThreadPoolSize(concurrent) 

def getStatus(ourl): 
    url = urlparse(ourl) 
    conn = httplib.HTTPConnection(url.netloc) 
    conn.request("HEAD", url.path) 
    res = conn.getresponse() 
    return res.status 

def processResponse(response,url): 
    print response, url 
    processedOne() 

def processError(error,url): 
    print "error", url#, error 
    processedOne() 

def processedOne(): 
    if finished.next()==added: 
     reactor.stop() 

def addTask(url): 
    req = threads.deferToThread(getStatus, url) 
    req.addCallback(processResponse, url) 
    req.addErrback(processError, url) 

added=0 
for url in open('urllist.txt'): 
    added+=1 
    addTask(url.strip()) 

try: 
    reactor.run() 
except KeyboardInterrupt: 
    reactor.stop() 
+0

沒有理由處理從reactor.run()引發的KeyboardInterrupt。 C-c使reactor.run()返回*,而不是引發異常。 – 2010-04-14 12:37:48

回答

6

您使用waaaaay太多「反應堆召喚」(例如,有這麼agent.request呼叫進入反應器的好機會)從主線程。我不確定這是否是您的問題,但仍然不支持 - 從非反應器線程調用的唯一反應堆調用是reactor.callFromThread。

另外,整個架構看起來很奇怪。你爲什麼不在主線上運行反應堆?即使您一次完成所有操作,從10,000個請求中讀取整個文件並將其拆分,也不應該成爲從反應器執行的問題。

您可能會碰到不使用任何線程的純扭曲解決方案。