如何將此Python代碼更改爲多進程而不是多線程？

下面的Python代碼連接到很多服務器，從每個服務器獲取一些信息並返回結果。它目前爲每個連接啓動一個單獨的線程。我想看看如何爲每個連接而不是線程使用單獨的進程來影響性能。這個代碼可以很容易地改變爲使用進程而不是線程？我到底需要做什麼？有什麼風險？如何將此Python代碼更改爲多進程而不是多線程？

的Python 2.6 /平臺的Linux

class ServerInfoGetter(threading.Thread): 

    def __init__(self, host, port=DEFAULT_PORT, timeout=15): 
     self.host = host 
     self.timeout = timeout 
     self.port = port 
     self.result = None 
     threading.Thread.__init__(self) 

    def get_result(self): 
     return self.result 

    def run(self): 
     try: 
      serv_check = ServCheck(self.host, \ 
            port=self.port, \ 
            timeout=self.timeout) 
      serv_check.get_info() 
      self.result = serv_check 
     except Exception, err: 
      logging.debug("Could not run ServCheck for : %s %s",self.host, err) 


def process_hosts(hosts_and_ports): 

    def producer(queue, hosts_and_ports): 
     for host, ports in hosts_and_ports.items(): 
      for port in ports: 
       logging.info("processing host: %s:%s", host, port) 
       thread = ServerInfoGetter(str(host), port) 
       thread.start() 
       queue.put(thread, True) # True so block until slot available 

    results = [] 

    def consumer(queue, total_checks): 
     while len(results) < total_checks: 
      thread = queue.get(True) 
      thread.join() 
      results.append(thread.get_result()) 

    logging.info("processing hosts") 
    queue = Queue(QUEUE_SIZE) 
    prod_thread = threading.Thread(target=producer, 
            args=(queue, 
            hosts_and_ports)) 

    cons_thread = threading.Thread(target=consumer, 
            args=(queue, 
            calculate_total_checks(hosts_and_ports))) 
    prod_thread.start() 
    cons_thread.start() 
    prod_thread.join() 
    cons_thread.join() 
    return results

來源

2011-03-05 RogerBarber

使用pp – 2011-03-05 21:14:37

@David：Eeek！談論矯枉過正！ – 2011-03-05 23:21:32

正如它在說documentation：

multiprocessing是使用類似於一個API支持產卵的處理的包threading模塊。 [...]在multiprocessing中，通過創建Process對象並調用其方法start()來產生進程。過程遵循threading.Thread的API。

所以，基本上，你只要有multiprocessing.Process對象（同樣，隊列需要與multiprocessing.Queue對象來代替），以取代所有threading.Thread對象。

至少，這就是它的樣子。但是，在實踐中，全部爲的對象表明需要跨越Process的邊界需要爲multiprocessing.Value對象。否則，他們將永遠不會跨線程更新。

這包括self.host，self.timeout，self.port，self.result如果你只打算修改ServerInfoGetter類。閱讀多處理文檔的其餘部分，瞭解您需要使用的其他數據類型。另外，作爲一個旁註，我不確定它是否會成爲Linux 2.6上的python 2.6的問題，但是對於Windows上的python 2.7，空閒和交互式解釋器都有麻煩（至少對我來說）多。使用python或pythonw可執行文件直接執行腳本時，這些問題會消失。更新 - 在我的Slackware盒子上的python 2.5.1沒有這個問題，所以你在交互模式下也可以很好......雖然winwaed不是，所以誰知道......？

來源

2011-03-05 21:59:27 Nate

我也看到了Ubuntu 10的這些問題。 – winwaed 2011-03-05 22:06:41

在互動模式下，你的意思是？你知道哪個版本的Python？ – Nate 2011-03-05 22:10:52

python 2.6.6。在編輯器菜單中選擇「運行」時出現問題。使用「python myprog.py」在命令行上運行正常。沒有嘗試以真正的交互式鍵入模式進行多處理。我的腳本在多個CPU中使用map_async作爲map-reduce算法。我也在使用可能會影響事物的psycopg。 – winwaed 2011-03-06 00:30:41

你有沒有使用一個單一的過程和一個單獨的線程例如考慮通過使用twisted？多過程選項可以僅相當容易當os.fork可用....

來源

2011-03-05 21:26:00

對不起，我應該說我正在使用Python 2.6和Linux。我已經更新了Q.所以我不認爲開啓多個流程是有問題的。 – RogerBarber 2011-03-05 21:36:19

我同意，利用一個處理器的多個內核真的屬於「不成熟優化」的範疇，至少在你使用多個服務器之前。此時，如果您擴展到多個服務器，則通過將每個內核視爲自己的服務器來擴展多個內核是微不足道的。 – jpsimons 2011-03-05 22:06:28

同意。對於這種類型的問題，多重處理是錯誤的解決方案，但異步（如Twisted的延遲）是正確的解決方案。 – 2011-03-05 22:26:23

如何將此Python代碼更改爲多進程而不是多線程？

回答

相關問題