2013-04-05 98 views
3

在C++/mpi(mpich2)中有一個簡單的程序,它發送一個double類型的數組。如果數組的大小超過9000,那麼在調用MPI_Send期間,我的程序會掛起。如果數組小於9000(例如8000),則編程可以正常工作。源代碼是波紋管:MPI掛在MPI_Send上發送大消息

的main.cpp

using namespace std; 

Cube** cubes; 
int cubesLen; 

double* InitVector(int N) { 
    double* x = new double[N]; 
    for (int i = 0; i < N; i++) { 
     x[i] = i + 1; 
    } 
    return x; 
} 

void CreateCubes() { 
    cubes = new Cube*[12]; 
    cubesLen = 12; 
    for (int i = 0; i < 12; i++) { 
     cubes[i] = new Cube(9000); 
    } 
} 

void SendSimpleData(int size, int rank) { 
    Cube* cube = cubes[0]; 
    int nodeDest = rank + 1; 
    if (nodeDest > size - 1) { 
     nodeDest = 1; 
    } 

    double* coefImOut = (double *) malloc(sizeof (double)*cube->coefficentsImLength); 
    cout << "Before send" << endl; 
    int count = cube->coefficentsImLength; 
    MPI_Send(coefImOut, count, MPI_DOUBLE, nodeDest, 0, MPI_COMM_WORLD); 
    cout << "After send" << endl; 
    free(coefImOut); 

    MPI_Status status; 
    double *coefIm = (double *) malloc(sizeof(double)*count); 

    int nodeFrom = rank - 1; 
    if (nodeFrom < 1) { 
     nodeFrom = size - 1; 
    } 

    MPI_Recv(coefIm, count, MPI_DOUBLE, nodeFrom, 0, MPI_COMM_WORLD, &status); 
    free(coefIm); 
} 

int main(int argc, char *argv[]) { 
    int size, rank; 
    const int root = 0; 

    MPI_Init(&argc, &argv); 
    MPI_Comm_size(MPI_COMM_WORLD, &size); 
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 

    CreateCubes(); 

    if (rank != root) { 
     SendSimpleData(size, rank); 
    } 

    MPI_Finalize(); 
    return 0; 
} 

類立方

class Cube { 
public: 
    Cube(int size); 
    Cube(const Cube& orig); 
    virtual ~Cube(); 

    int Id() { return id; } 
    void Id(int id) { this->id = id; } 

    int coefficentsImLength; 
    double* coefficentsIm; 

private: 
    int id; 
}; 

Cube::Cube(int size) { 
    this->coefficentsImLength = size; 

    coefficentsIm = new double[size]; 
    for (int i = 0; i < size; i++) { 
     coefficentsIm[i] = 1; 
    } 
} 

Cube::Cube(const Cube& orig) { 
} 

Cube::~Cube() { 
    delete[] coefficentsIm; 
} 

該方案在4個流程運行:

mpiexec -n 4 ./myApp1 

任何想法?

+1

請發表您的完整代碼,包括MPI_RECV。 – 2013-04-05 12:40:47

+0

好吧,雖然我調用MPI_Send時應用程序崩潰。使用調試器Alinea DDT進行測試。 – user2240771 2013-04-05 13:17:00

+0

...這與「掛起」有點不同。這次事故的性質是什麼?但是我們仍然需要看到更多的代碼,最好是一個簡單的可重現的例子。 – 2013-04-05 13:31:29

回答

9

立方體東西是無關緊要的:考慮

#include <mpi.h> 
#include <cstdlib> 

using namespace std; 

int main(int argc, char *argv[]) { 
    int size, rank; 
    const int root = 0; 

    int datasize = atoi(argv[1]); 

    MPI_Init(&argc, &argv); 
    MPI_Comm_size(MPI_COMM_WORLD, &size); 
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 

    if (rank != root) { 
     int nodeDest = rank + 1; 
     if (nodeDest > size - 1) { 
      nodeDest = 1; 
     } 
     int nodeFrom = rank - 1; 
     if (nodeFrom < 1) { 
      nodeFrom = size - 1; 
     } 

     MPI_Status status; 
     int *data = new int[datasize]; 
     for (int i=0; i<datasize; i++) 
      data[i] = rank; 

     cout << "Before send" << endl; 
     MPI_Send(&data, datasize, MPI_INT, nodeDest, 0, MPI_COMM_WORLD); 
     cout << "After send" << endl; 
     MPI_Recv(&data, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status); 

     delete [] data; 

    } 

    MPI_Finalize(); 
    return 0; 
} 

哪裏運行給

$ mpirun -np 4 ./send 1 
Before send 
After send 
Before send 
After send 
Before send 
After send 
$ mpirun -np 4 ./send 65000 
Before send 
Before send 
Before send 

如果你DDT看着消息隊列窗口中,你會看到大家都在送,沒有人正在接收,而你有一個經典的deadlock

MPI_Send的語義不明確,但它可以阻止,直到「接收已發佈」。 MPI_Ssend在這方面更清楚;它會一直阻塞,直到收到郵件。有關不同發送模式的詳細信息,請參閱here

它爲較小的消息工作的原因是執行的意外;對於「足夠小」的消息(對於你的情況,它看起來是< 64kB),你的MPI_Send實現使用「預先發送」協議,並且不會在接收上阻塞;對於較大的消息來說,只要將消息的緩衝副本保存在內存中就不一定安全,Send將等待匹配的接收(它始終被允許執行)。

有幾件事你可以做,以避免這種情況;你所要做的就是確保不是每個人都同時調用阻塞的MPI_Send。你可以(比方說)處理器先發送,然後接收,而奇數處理器先接收,然後發送。你可以使用非阻塞通信(Isend/Irecv/Waitall)。但在這種情況下最簡單的解決方案是使用MPI_Sendrecv,這是一個阻塞(發送+接收),而不是阻塞發送加阻塞接收。發送和接收將同時執行,並且該功能將被阻止,直到兩者都完成。所以此工程

#include <mpi.h> 
#include <cstdlib> 

using namespace std; 

int main(int argc, char *argv[]) { 
    int size, rank; 
    const int root = 0; 

    int datasize = atoi(argv[1]); 

    MPI_Init(&argc, &argv); 
    MPI_Comm_size(MPI_COMM_WORLD, &size); 
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); 

    if (rank != root) { 
     int nodeDest = rank + 1; 
     if (nodeDest > size - 1) { 
      nodeDest = 1; 
     } 
     int nodeFrom = rank - 1; 
     if (nodeFrom < 1) { 
      nodeFrom = size - 1; 
     } 

     MPI_Status status; 
     int *outdata = new int[datasize]; 
     int *indata = new int[datasize]; 
     for (int i=0; i<datasize; i++) 
      outdata[i] = rank; 

     cout << "Before sendrecv" << endl; 
     MPI_Sendrecv(outdata, datasize, MPI_INT, nodeDest, 0, 
        indata, datasize, MPI_INT, nodeFrom, 0, MPI_COMM_WORLD, &status); 
     cout << "After sendrecv" << endl; 

     delete [] outdata; 
     delete [] indata; 
    } 

    MPI_Finalize(); 
    return 0; 
} 

運行提供了

$ mpirun -np 4 ./send 65000 
Before sendrecv 
Before sendrecv 
Before sendrecv 
After sendrecv 
After sendrecv 
After sendrecv 
+0

非常感謝您的詳細和全面的答案! – user2240771 2013-04-08 06:50:10