我寫了一個小代碼python 2.7
,用於通過subprocess
在shell上啓動4個獨立進程,使用庫mpi4py
。我得到ORTE_ERROR_LOG,我想知道它在哪裏發生,爲什麼。python,mpi和shell子進程:orte_error_log
這是我的代碼:
#!/usr/bin/python
import subprocess
import re
import sys
from mpi4py import MPI
def main():
root='base'
comm = MPI.COMM_WORLD
if comm.rank == 0:
job = [root+str(i) for i in range(4)]
else:
job = None
job = comm.scatter(job, root=0)
cmd="../../montepython/montepython/MontePython.py -conf ../config/default.conf -p ../config/XXXX.param -o ../chains/XXXX -N 10000 > XXXX.log"
cmd_job = re.sub(r"XXXX", job, cmd)
subprocess.check_call(cmd_job, shell=True)
return
if __name__ == '__main__':
main()
我用命令運行:
mpirun -np 4 ./run.py
這是錯誤信息,我得到:
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file base/odls_base_default_fns.c at line 1762
[localhost:51087] [[51455,0],0] ORTE_ERROR_LOG: Not found in file orted/orted_comm.c at line 916
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: localhost
System call: open(2)
Error: No such file or directory (errno 2)
--------------------------------------------------------------------------
我能不明白的地方錯誤正在發生。 MontePython
本身不應該使用mpi
,因爲它應該是串行的。
我向openmpi用戶論壇尋求幫助。他們告訴我,問題可能是由於子進程和MPI實現之間的不良交互。我應該改變從subprocess
到spawn
,但這個功能是不是非常有據可查的,我不能確定如何進行