1

我有代碼,使元素的獨特組合。有6種類型,每種大約有100種。所以有100^6個組合。必須計算每個組合,檢查相關性,然後丟棄或保存。什麼是嵌套for循環最大的CPU使用最簡單的方法是什麼?

代碼的相關位看起來是這樣的:

def modconffactory(): 
for transmitter in totaltransmitterdict.values(): 
    for reciever in totalrecieverdict.values(): 
     for processor in totalprocessordict.values(): 
      for holoarray in totalholoarraydict.values(): 
       for databus in totaldatabusdict.values(): 
        for multiplexer in totalmultiplexerdict.values(): 
         newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer] 
         data_I_need = dosomethingwith(newconfiguration) 
         saveforlateruse_if_useful(data_I_need) 

現在,這需要很長的時間,這是好的,但現在我知道這個過程(使配置,然後供以後使用計算)爲一次只能使用我的8個處理器內核中的一個。

我一直在閱讀關於多線程和多處理,但我只看到不同過程的例子,而不是如何多線程一個進程。在我的代碼中,我調用了兩個函數:'dosomethingwith()'和'saveforlateruse_if_useful()'。我可以將它們分成不同的進程,並將它們併發運行到for循環,對吧?

但是for循環本身呢?我可以加快這一過程嗎?因爲那是時間消耗的地方。 (< - 這是我的主要問題)

有沒有作弊?例如編譯爲C然後自動運行os多線程?

+0

你會找到你的解決方案在這裏: https://stackoverflow.com/questions/5784389/using-100-of-all-cores-with-python-multiprocessing –

+0

我不明白我可以把for循環放入管道正如你所說的那種溺水者所做的那樣。我已經閱讀過,但我不明白這對我有什麼幫助?你可以解釋嗎? –

+0

閱讀多處理文檔(python的標準庫)或庫joblib的文檔。由於第一個循環的大小爲100,並且您有8 <= 100 cpus,所以只需要並行化該外部循環(joblib的第一個示例應該足夠了;基本上:定義一個函數,除了外部循環之外,其他所有函數都可以執行;這個先驗選擇的值是一個論點)。 (另外:最後一句顯示了對編程,操作系統和處理器的理解缺乏明確的理解。也許首先閱讀一些介紹性課程。) – sascha

回答

3

我只看到不同過程的例子,而不是如何多線程的一個過程

在Python多線程,但由於GIL(Global Interpreter Lock),它非常無效。所以如果你想使用你所有的處理器內核,如果你想要併發性,除了使用多個進程,你可以使用multiprocessing模塊(也可以使用其他語言,但沒有這些問題),沒有別的選擇。

多處理器的使用你的情況

近似例如:

import multiprocessing 

WORKERS_NUMBER = 8 

def modconffactoryProcess(generator, step, offset, conn): 
    """ 
    Function to be invoked by every worker process. 

    generator: iterable object, the very top one of all you are iterating over, 
    in your case, totalrecieverdict.values() 

    We are passing a whole iterable object to every worker, they all will iterate 
    over it. To ensure they will not waste time by doing the same things 
    concurrently, we will assume this: each worker will process only each stepTH 
    item, starting with offsetTH one. step must be equal to the WORKERS_NUMBER, 
    and offset must be a unique number for each worker, varying from 0 to 
    WORKERS_NUMBER - 1 

    conn: a multiprocessing.Connection object, allowing the worker to communicate 
    with the main process 
    """ 
    for i, transmitter in enumerate(generator): 
     if i % step == offset: 
      for reciever in totalrecieverdict.values(): 
       for processor in totalprocessordict.values(): 
        for holoarray in totalholoarraydict.values(): 
         for databus in totaldatabusdict.values(): 
          for multiplexer in totalmultiplexerdict.values(): 
           newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer] 
           data_I_need = dosomethingwith(newconfiguration) 
           saveforlateruse_if_useful(data_I_need) 
    conn.send('done') 


def modconffactory(): 
    """ 
    Function to launch all the worker processes and wait until they all complete 
    their tasks 
    """ 
    processes = [] 
    generator = totaltransmitterdict.values() 
    for i in range(WORKERS_NUMBER): 
     conn, childConn = multiprocessing.Pipe() 
     process = multiprocessing.Process(target=modconffactoryProcess, args=(generator, WORKERS_NUMBER, i, childConn)) 
     process.start() 
     processes.append((process, conn)) 
    # Here we have created, started and saved to a list all the worker processes 
    working = True 
    finishedProcessesNumber = 0 
    try: 
     while working: 
      for process, conn in processes: 
       if conn.poll(): # Check if any messages have arrived from a worker 
        message = conn.recv() 
        if message == 'done': 
         finishedProcessesNumber += 1 
      if finishedProcessesNumber == WORKERS_NUMBER: 
       working = False 
    except KeyboardInterrupt: 
     print('Aborted') 

您可以調整WORKERS_NUMBER您的需求。

同樣的,multiprocessing.Pool

import multiprocessing 

WORKERS_NUMBER = 8 

def modconffactoryProcess(transmitter): 
    for reciever in totalrecieverdict.values(): 
     for processor in totalprocessordict.values(): 
      for holoarray in totalholoarraydict.values(): 
       for databus in totaldatabusdict.values(): 
        for multiplexer in totalmultiplexerdict.values(): 
         newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer] 
         data_I_need = dosomethingwith(newconfiguration) 
         saveforlateruse_if_useful(data_I_need) 


def modconffactory(): 
    pool = multiprocessing.Pool(WORKERS_NUMBER) 
    pool.map(modconffactoryProcess, totaltransmitterdict.values()) 

你可能想用.map_async代替.map

兩個片段做同樣的,但我將在第一個說,你有過計劃的更多控制。

我想第二個是最簡單的,雖然:)

但第一個應該給你什麼是第二個

multiprocessing文檔發生的想法:https://docs.python.org/3/library/multiprocessing.html

相關問題