我工作在與Vampir羣集可視化mpi通信。由於該集羣缺少MPI3實現,我在我的主目錄中安裝了OpenMPI 2.0.0(沒有使用--prefix以外的其他標誌)(在沒有Vampir的情況下工作正常)。現在我不知道如何將我的本地MPI3安裝與Vampir結合起來構建我的程序(fetchAndOpTest.f90)。我試過如下:吸血鬼與當地mpi安裝
vtf90 -vt:fc ~/OpenMPI2/bin/mpif90 -o fetchAndOpTestF90.x fetchAndOpTest.f90
(不知道是不是很重要,但是這給了以下警告:/usr/bin/ld: warning: libmpi.so.1, needed by /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../lib/libmpi_f77.so, may conflict with libmpi.so.20
)
與~/OpenMPI2/bin/mpirun -np 2 fetchAndOpTestF90.x
結果執行我的計劃中: fetchAndOpTestF90.x: error while loading shared libraries: libvt-mpi.so.0: cannot open shared object file: No such file or directory [...]
所以我也試過vtf90 -vt:fc ~/OpenMPI2/bin/mpif90 -L/opt/vampirtrace/5.14.4/lib -o fetchAndOpTestF90.x fetchAndOpTest.f90
,但它並沒有改變ldd輸出。
編輯:編輯LD_LIBRARY_PATH由@Harald建議。
> ldd fetchAndOpTestF90.x linux-vdso.so.1 => (0x00007ffc6ada9000) libmpi_f77.so.1 => /usr/lib/libmpi_f77.so.1 (0x00007ff8fdf2e000) libvt-mpi.so.0 => /opt/vampirtrace/5.14.4/lib/libvt-mpi.so.0 (0x00007ff8fdca3000) libvt-mpi-unify.so.0 => /opt/vampirtrace/5.14.4/lib/libvt-mpi-unify.so.0 (0x00007ff8fda18000) libotfaux.so.0 => /opt/vampirtrace/5.14.4/lib/libotfaux.so.0 (0x00007ff8fd810000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff8fd50c000) libopen-trace-format.so.1 => /opt/vampirtrace/5.14.4/lib/libopen-trace-format.so.1 (0x00007ff8fd2c4000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff8fd0ab000) libpapi.so.5.3 => /usr/lib/x86_64-linux-gnu/libpapi.so.5.3 (0x00007ff8fce57000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff8fcc53000) libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007ff8fc939000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff8fc633000) libmpi_usempi.so.20 => /home/USER/OpenMPI2/lib/libmpi_usempi.so.20 (0x00007ff8fc430000) libmpi_mpifh.so.20 => /home/USER/OpenMPI2/lib/libmpi_mpifh.so.20 (0x00007ff8fc1df000) libmpi.so.20 => /home/USER/OpenMPI2/lib/libmpi.so.20 (0x00007ff8fbefb000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff8fbce5000) libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007ff8fbaa9000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff8fb88b000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff8fb4c6000) libmpi.so.1 => /usr/lib/libmpi.so.1 (0x00007ff8fb145000) /lib64/ld-linux-x86-64.so.2 (0x00007ff8fe162000) libpfm.so.4 => /usr/lib/x86_64-linux-gnu/libpfm.so.4 (0x00007ff8fadff000) libopen-pal.so.20 => /home/USER/OpenMPI2/lib/libopen-pal.so.20 (0x00007ff8fab09000) libopen-rte.so.20 => /home/USER/OpenMPI2/lib/libopen-rte.so.20 (0x00007ff8fa887000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007ff8fa684000) libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007ff8fa43b000) libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007ff8fa231000) libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007ff8fa026000) libpciaccess.so.0 => /usr/lib/x86_64-linux-gnu/libpciaccess.so.0 (0x00007ff8f9e1d000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff8f9c15000)
現在執行拋出錯誤:mpirun noticed that process rank 0 with PID 0 on node cluster exited on signal 11 (Segmentation fault)
(程序是正確的,並建立和執行與當地MPI3安裝,無需VAMPIR運行正常)
我沒有建立靜態庫,所以我直接帶着建議2.這似乎工作正常(我編輯ldd輸出),但現在執行'〜/ OpenMPI2/bin/mpirun -np 1 fetchAndOpTest.x'會拋出一個錯誤:'mpirun注意到在信號11(分段錯誤)上退出的節點集羣上的進程等級爲0,節點集羣上的PID爲0(也與其他程序完全正確)。也許警告表明這個問題? –
我會給Score-P一個嘗試。我認爲在這些情況下,這是我的問題的最佳解決方案。 –