2015-10-07 86 views
0

我寫了一個小代碼python 2.7,用於通過subprocess在shell上啓動4個獨立進程,使用庫mpi4py。我得到ORTE_ERROR_LOG,我想知道它在哪裏發生,爲什麼。python,mpi和shell子進程:orte_error_log

這是我的代碼:

#!/usr/bin/python 
import subprocess 
import re 
import sys 
from mpi4py import MPI 

def main(): 
    root='base' 
    comm = MPI.COMM_WORLD 
    if comm.rank == 0: 
     job = [root+str(i) for i in range(4)] 
    else: 
     job = None 

    job = comm.scatter(job, root=0) 
    cmd="../../montepython/montepython/MontePython.py -conf ../config/default.conf -p ../config/XXXX.param -o ../chains/XXXX -N 10000 > XXXX.log" 

    cmd_job = re.sub(r"XXXX", job, cmd) 
    subprocess.check_call(cmd_job, shell=True) 
    return 

if __name__ == '__main__': 
    main() 

我用命令運行:

mpirun -np 4 ./run.py 

這是錯誤信息,我得到:

[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762 
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916 
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762 
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916 
-------------------------------------------------------------------------- 
A system call failed during shared memory initialization that should 
not have. It is likely that your MPI job will now either abort or 
experience performance degradation. 

    Local host: localhost 
    System call: open(2) 
    Error:  No such file or directory (errno 2) 
-------------------------------------------------------------------------- 

我能不明白的地方錯誤正在發生。 MontePython本身不應該使用mpi,因爲它應該是串行的。


我向openmpi用戶論壇尋求幫助。他們告訴我,問題可能是由於子進程和MPI實現之間的不良交互。我應該改變從subprocessspawn,但這個功能是不是非常有據可查的,我不能確定如何進行

回答

1
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762 
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916 
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762 
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916 

這些都發生在ORTE框架,負責啓動和控制的一部分MPI過程。這可能是因爲Open MPI存儲會話信息的臨時目錄中沒有足夠的空間。

-------------------------------------------------------------------------- 
A system call failed during shared memory initialization that should 
not have. It is likely that your MPI job will now either abort or 
experience performance degradation. 

    Local host: localhost 
    System call: open(2) 
    Error:  No such file or directory (errno 2) 
-------------------------------------------------------------------------- 

該錯誤來自在Open MPI中實現共享內存intranode通信的模塊。原因很可能是tmpfs被安裝在一些非標準的地方或根本沒有安裝。如果沒有共享內存模塊,庫將使用較慢的TCP/IP模塊(如果啓用,則默認爲此模式),或者應用程序崩潰,因爲沒有其他方式進行通信。

這兩個錯誤信息都可能與您的程序無關。嘗試更簡單的方法,例如規範的「Hello World!」例如,以確認Open MPI運行正常。