爲什麼concurrent.futures.ProcessPoolExecutor的性能很低？

我正在嘗試利用Python3中的concurrent.futures.ProcessPoolExecutor來並行處理大矩陣。代碼的一般結構是：爲什麼concurrent.futures.ProcessPoolExecutor的性能很低？

class X(object): 

self.matrix 

def f(self, i, row_i): 
    <cpu-bound process> 

def fetch_multiple(self, ids): 
    with ProcessPoolExecutor() as executor: 
     futures = [executor.submit(self.f, i, self.matrix.getrow(i)) for i in ids] 
     return [f.result() for f in as_completed(futures)]

self.matrix是一個大scipy csr_matrix。 f是我的concurrrent函數，它需要一行self.matrix並對其應用CPU上的進程。最後，fetch_multiple是並行運行多個實例f並返回結果的函數。

的問題是，在運行腳本之後，所有的CPU核心均小於50％，忙（見下截圖）：

爲什麼所有的內核是不是很忙？

我認爲問題是self.matrix的大對象，並在進程之間傳遞行向量。我怎麼解決這個問題？

來源

2017-08-31 AmirHJ

是的。開銷不應該那麼大 - 但它很可能是你的CPU出現中斷的原因（儘管它們應該忙於傳遞數據）。

但試試這裏的配方，將對象的「指針」傳遞給使用共享內存的子進程。

http://briansimulator.org/sharing-numpy-arrays-between-processes/

從那裏報價：

from multiprocessing import sharedctypes 
size = S.size 
shape = S.shape 
S.shape = size 
S_ctypes = sharedctypes.RawArray('d', S) 
S = numpy.frombuffer(S_ctypes, dtype=numpy.float64, count=size) 
S.shape = shape

現在我們可以發送S_ctypes和形狀的子進程多，並將其轉換回numpy的數組中的子過程如下：

from numpy import ctypeslib 
S = ctypeslib.as_array(S_ctypes) 
S.shape = shape

這應該是棘手的照顧引用計數，但我想numpy.ctypeslib照顧 - 所以，只是協調實際行號傳遞給子進程，他們不在同一個數據上工作

來源

2017-09-01 02:05:13 jsbueno

爲什麼concurrent.futures.ProcessPoolExecutor的性能很低？

回答

相關問題