我有一些Python代碼,它使用multiprocessing.pool.Pool.imap_unordered
在CPU綁定中並行創建一堆臨時文件。然後,我從結果迭代器中讀取文件名,在第二個磁盤綁定步驟中處理每個文件,然後刪除它們。通常磁盤綁定步驟是兩者中較快的,因此每個臨時文件在創建下一個臨時文件之前都會被處理和刪除。但是,在網絡文件系統上運行時,磁盤綁定步驟可能會變爲慢速步驟,在這種情況下,並行運行的CPU綁定步驟會開始生成臨時文件,而磁盤綁定步驟可能會處理並刪除它們,所以大量的臨時文件開始積累。爲了避免這個問題,如果並行迭代比消費者提前超過10個項目,我想要暫停並行迭代。有沒有其他可以替代multiprocessing.pool.Pool.imap_unordered
的產品?我可以告訴Python的多處理池不要太遙遙領先嗎?
下面是一些示例代碼來模擬問題:
import os
from time import sleep
from multiprocessing.pool import Pool
input_values = list(range(10))
def fast_step(x):
print("Running fast step for {x}".format(x=x))
return x
def slow_step(x):
print("Starting slow step for {x}".format(x=x))
sleep(1)
print("Finishing slow step for {x}".format(x=x))
return x
mypool = Pool(2)
step1_results = mypool.imap(fast_step, input_values)
for i in step1_results:
slow_step(i)
運行此產生類似:
$ python temp.py
Running fast step for 0
Running fast step for 1
Running fast step for 2
Running fast step for 3
Running fast step for 4
Starting slow step for 0
Running fast step for 5
Running fast step for 6
Running fast step for 7
Running fast step for 8
Running fast step for 9
Finishing slow step for 0
Starting slow step for 1
Finishing slow step for 1
Starting slow step for 2
Finishing slow step for 2
Starting slow step for 3
Finishing slow step for 3
Starting slow step for 4
Finishing slow step for 4
Starting slow step for 5
Finishing slow step for 5
Starting slow step for 6
Finishing slow step for 6
Starting slow step for 7
Finishing slow step for 7
Starting slow step for 8
Finishing slow step for 8
Starting slow step for 9
Finishing slow step for 9