2016-03-22 57 views
8
開始在MPI_INIT開放MPI當

我試圖通過Python用的openmpi訪問共享庫,但出於某種原因,我得到了以下錯誤消息:錯誤通過Python

[Geo00433:01196] mca: base: component_find: unable to open /usr/li/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) 
[Geo00433:01196] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) 
[Geo00433:01196] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) 
[Geo00433:01196] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) 
[Geo00433:01196] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) 
[Geo00433:01196] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) 
------------------------------------------------------------------------- 
It looks like opal_init failed for some reason; your parallel process is 
likely to abort. There are many reasons that a parallel process can 
fail during opal_init; some of which are due to configuration or 
environment problems. This failure appears to be an internal failure; 
here is some additional information (which may only be relevant to an 
Open MPI developer): 

    opal_shmem_base_select failed 
    --> Returned value -1 instead of OPAL_SUCCESS 
-------------------------------------------------------------------------- 
[Geo00433:01196] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 
-------------------------------------------------------------------------- 
It looks like MPI_INIT failed for some reason; your parallel process is 
likely to abort. There are many reasons that a parallel process can 
fail during MPI_INIT; some of which are due to configuration or environment 
problems. This failure appears to be an internal failure; here is some 
additional information (which may only be relevant to an Open MPI 
developer): 

    ompi_mpi_init: orte_init failed 
    --> Returned "Error" (-1) instead of "Success" (0) 
-------------------------------------------------------------------------- 
*** An error occurred in MPI_Init 
*** on a NULL communicator 
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort 
[Geo00433:1196] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! 

任何線索是什麼原因呢?我已經檢查了很多網頁,但無法找到我的問題的解決方案。

我有Ubuntu 15.10和mpich以及安裝了開放mpi。

非常感謝你們!

+0

請提供您的代碼。 – Jan

+3

您的程序可能是針對來自MPICH的'libmpi.so'編譯的,但在運行時會找到Open MPI版本。同時有兩個MPI實現通常會導致這樣的問題。 –

+0

確實,這與找到正確的OpenMPI版本有關。謝謝@HristoIliev – Jannis

回答

3

即使安裝了Open MPI,我在Ubuntu 16.04上也遇到了同樣的問題(或者與錯誤信息略有不同的問題)。從我可以告訴的是,如何構建Ubuntu的mpi4py軟件包存在問題,但我不確定它到底是什麼。

複製:由於這個問題並沒有完全清楚錯誤信息是如何產生的(我沒有聲譽來編輯它),所以我是這麼知道的。首先,安裝Ubuntu的mpi4py包,然後進入Python環境:

$ sudo apt-get install mpi 
$ python 

內巨蟒,請嘗試以下操作:

>>> from mpi4py import MPI 

你應該再拿到像OP的錯誤信息了。

解決方案:這是我如何得到它的工作。首先卸載Ubuntu的軟件包:

$ sudo apt-get remove mpi4py 

然後安裝Open MPI頭(下一步涉及建築mpi4py)和PIP:

$ sudo apt-get install libopenmpi-dev python-pip 

最後安裝mpi4py:

$ sudo pip install mpi4py 

如果你嘗試上面的python命令,它現在應該可以正常工作。

+0

似乎是一個很好的解決方法,現在;我將已知(目前已打開)的問題鏈接到https://bugs.launchpad.net/ubuntu/+source/mpi4py/+bug/1583432 – bbarker

0

錯誤消息確實與Hristo Iliev指出的不同的.so文件有關。 編譯我使用的程序時,編譯器在我的Linux機器上發現了「錯誤」的OpenMPI,即通過明確指定使用OpenMPI,問題得以解決。

感謝您的幫助!

0

當我使用由SWIG自己包裝的MPI的Python界面時,我也遇到了類似的錯誤。如上所述,此錯誤可能與同一臺計算機上的MPI實現的不同版本(例如計算機上的OpenMPI和MPICH)有關。

我通過編譯和安裝新版本的MPICH解決了這個問題。然後在.bashrc中更改環境變量並使用新的mpicxx或mpicc編譯我自己的程序。錯誤會消失。

0

嘗試在Ubuntu 16.04 LTS上使用mpi4py時出現類似錯誤。 在我的情況下,錯誤與mpicc包裝不在我的搜索路徑有關。

我所做的要解決的問題是以下

$ sudo的PIP卸載mpi4py卸載當前mpi4py

  • 找到路徑到您的mpicc與

$ which mp ICC

$ sudo的ENV MPICC = /路徑/要/ mpicc PIP安裝mpi4py

,該錯誤信息消失後我能夠用python運行MPI