2015-10-17 98 views
0

鑑於大量的搜索,我仍然很難得到使用多個進程運行的特定功能。要求是:proceses的多處理混淆 - 基礎

  • 限制數
  • 傳遞多個參數映射

的最新嘗試運行,但是time.sleep似乎影響到所有進程 - 執行時間相同 - 20秒,無論池是否用於多進程foofoo直接調用(它應分別爲4/20秒)。我錯過了什麼?

from multiprocessing import Pool, Process, Lock 
import os 
import time 

def foo(arg): 
    print '{} - {}'.format(arg[0], os.getpid()) 
    time.sleep(1) 

if __name__ == '__main__': 
    script_start_time = time.time() 

    pool = Pool(processes=5) 
    for i in range(20): 
     arg = [i, i] 
     pool.map(foo, [arg]) 

    pool.close() #necessary to prevent zombies 
    pool.join() #wait for all processes to finish 

    print 'Execution time {}s '.format(time.time() - script_start_time) 

結果:

0 - 5660 
1 - 5672 
2 - 5684 
3 - 5704 
4 - 5716 
5 - 5660 
6 - 5672 
7 - 5684 
8 - 5704 
9 - 5716 
10 - 5660 
11 - 5672 
12 - 5684 
13 - 5704 
14 - 5716 
15 - 5660 
16 - 5672 
17 - 5684 
18 - 5704 
19 - 5716 
Execution time 20.4240000248s 
+1

變化'map'到'map_async'。 – roippi

+1

Pool.map阻塞,直到評估完所有傳遞的迭代爲止。使用所有參數調用映射一次,您將獲得併發性。 – Javier

回答

0

正如在評論中提到的,pool.map將阻塞,直到執行完畢,所以你必須要麼apply_asyncmap_async提交作業,並使用一個回調來處理你的函數返回數據。或者,您可以提前建立所有輸入,並立即致電map

在此示例中,apply_async和map_async非常相似,區別在於apply_async一次只能提交一個作業,並且支持傳遞多個args和kwargs。例如:

from multiprocessing import Pool 
import os 
import time 

def add(a, b): 
    c = a+b 
    print(f'{a}+{b} = {c} from process: {os.getpid()}') #python 3 f-strings are nifty :) 
    time.sleep(1) 
    return c 

if __name__ == '__main__': 
    script_start_time = time.time() 
    pool = Pool(processes=5) 
    results = [] 
    for a in range(5): 
     for b in range(5,10): 
      pool.apply_async(add, (a,b), callback=lambda c: results.append(c)) 
    pool.close() #necessary to prevent zombies 
    pool.join() #wait for all processes to finish 
    print('results', results) 
    print('Execution time {}s '.format(time.time() - script_start_time))

注意到如何調用apply_async當參數傳遞。

或者,您可以使用普通地圖一次性傳遞參數,但這需要您的函數只接受一個參數。這是starmap方法有用的地方。它需要的元組可迭代,並解包元組到該函數的參數,所以)的pool.starmap(foo, [(a,b),(c,d),(e,f)]輸入將解包的每對進foo,它採用兩個參數:

if __name__ == '__main__': 
    script_start_time = time.time() 
    pool = Pool(processes=5) 
    args = [(a,b) for a in "abc" for b in "ABC"] 
    print(pool.starmap(add, args)) #same add function from before (works with strings too) 
    pool.close() #necessary to prevent zombies 
    pool.join() #wait for all processes to finish 
    print('Execution time {}s '.format(time.time() - script_start_time))