2016-03-30 38 views
2

我想在MPI動態過程創建開始。我有一個父代碼(main.c)試圖產生新的worker/child進程(worker.c),並將它們合併到一個intracommunicator中。父代碼(main.c中)是問題與MPI產卵和合並

#include<stdio.h> 
#include "mpi.h" 

MPI_Comm child_comm; 
int rank, size; 
MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
MPI_Comm_size(MPI_COMM_WORLD, &size); 

if(rank == 0) 
{ 
    int num_processes_to_spawn = 2; 
    MPI_Comm_spawn("worker", MPI_ARGV_NULL, num_processes_to_spawn, MPI_INFO_NULL, 0, MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE); 

MPI_Comm intra_comm; 
MPI_Intercomm_merge(child_comm,0, &intra_comm); 
MPI_Barrier(child_comm); 


int tmp_size; 
MPI_Comm_size(intra_comm, &tmp_size); 
printf("size of intra comm world = %d\n", tmp_size); 

MPI_Comm_size(child_comm, &tmp_size); 
printf("size of child comm world = %d\n", tmp_size); 

MPI_Comm_size(MPI_COMM_WORLD, &tmp_size); 
printf("size of parent comm world = %d\n", tmp_size); 

} 

MPI_Finalize(); 

工人(孩子)的代碼是:

#include<stdio.h> 
    #include "mpi.h" 
    int main(int argc, char *argv[]) 
    { 
    int numprocs, myrank; 
    MPI_Comm parentcomm; 
    MPI_Comm intra_comm; 

    MPI_Init(&argc, &argv); 
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs); 
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank); 

    MPI_Comm_get_parent(&parentcomm); 

    MPI_Intercomm_merge(parentcomm, 1, &intra_comm); 
    MPI_Barrier(parentcomm); 

    if(myrank == 0) 
    { 
    int tmp_size; 
    MPI_Comm_size(parentcomm, &tmp_size); 
    printf("child size of parent comm world = %d\n", tmp_size); 

    MPI_Comm_size(MPI_COMM_WORLD, &tmp_size); 
    printf("child size of child comm world = %d\n", tmp_size); 

    MPI_Comm_size(intra_comm, &tmp_size); 
    printf("child size of intra comm world = %d\n", tmp_size); 

    MPI_Finalize(); 
    return 0; 
    } 
} 

我運行使用

mpirun -np 12 main.c 

此代碼後分裂和合並,我預計輸出爲

size of intra comm world = 14 
size of child comm world = 2 
size of parent comm world = 12 
child size of parent comm world = 12 
child size of child comm world = 2 
child size of intra comm world = 14 

但是我得到了fo輸出不正確的輸出。

size of intra comm world = 3 
    size of child comm world = 1 
    size of parent comm world = 12 
    child size of parent comm world = 2 
    child size of child comm world = 2 
    child size of intra comm world = 3 

我不明白它的錯誤在哪裏,可能有人讓我知道錯誤在哪裏。

感謝, 克里斯

回答

1

你的代碼的一些問題,我將在這裏列出受到影響:

  • 在主控部分,只處理0話費MPI_Comm_spawn()。這不是一個錯誤(特別是因爲您使用MPI_COMM_SELF作爲父母溝通者),但它實際上排除了後續合併中的所有其他進程。
  • 在主控部分和工作部分中,您使用MPI_Comm_size()來獲取遠程通信器的大小而不是MPI_Comm_remote_size()。因此,您將只能獲得通信器內部本地通信器的大小,而不是遠程通信器的大小。
  • 在主碼,只處理0話費MPI_Finalise()(更不用說main()MPI_Init()缺失)

這裏是你的代碼的一些固定版本:

master.c

#include <stdio.h> 
#include <mpi.h> 

int main(int argc, char *argv[]) { 

    MPI_Init(&argc, &argv); 
    int rank; 
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 

    MPI_Comm child_comm; 
    int num_processes_to_spawn = 2; 
    MPI_Comm_spawn("./worker", MPI_ARGV_NULL, 
        num_processes_to_spawn, MPI_INFO_NULL, 
        0, MPI_COMM_WORLD, 
        &child_comm, MPI_ERRCODES_IGNORE); 

    MPI_Comm intra_comm; 
    MPI_Intercomm_merge(child_comm, 0, &intra_comm); 

    if (rank == 0) { 
     int tmp_size; 
     MPI_Comm_size(intra_comm, &tmp_size); 
     printf("size of intra comm world = %d\n", tmp_size); 

     MPI_Comm_remote_size(child_comm, &tmp_size); 
     printf("size of child comm world = %d\n", tmp_size); 

     MPI_Comm_size(MPI_COMM_WORLD, &tmp_size); 
     printf("size of parent comm world = %d\n", tmp_size); 
    } 

    MPI_Finalize(); 

    return 0; 
} 

worker.c

#include <stdio.h> 
#include <mpi.h> 

int main(int argc, char *argv[]) { 

    MPI_Init(&argc, &argv); 

    int myrank; 
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank); 

    MPI_Comm parentcomm; 
    MPI_Comm_get_parent(&parentcomm); 

    MPI_Comm intra_comm; 
    MPI_Intercomm_merge(parentcomm, 1, &intra_comm); 

    if (myrank == 0) { 
     int tmp_size; 
     MPI_Comm_remote_size(parentcomm, &tmp_size); 
     printf("child size of parent comm world = %d\n", tmp_size); 

     MPI_Comm_size(MPI_COMM_WORLD, &tmp_size); 
     printf("child size of child comm world = %d\n", tmp_size); 

     MPI_Comm_size(intra_comm, &tmp_size); 
     printf("child size of intra comm world = %d\n", tmp_size); 
    } 

    MPI_Finalize(); 

    return 0; 
} 

在我的筆記本電腦給:

~> mpirun -n 12 ./master 
child size of parent comm world = 12 
child size of child comm world = 2 
child size of intra comm world = 14 
size of intra comm world = 14 
size of child comm world = 2 
size of parent comm world = 12 
+0

感謝吉爾。我意識到這是一個遠程團隊規模的問題; – marc