2017-05-05 81 views
0

我正在用C++編寫一個帶有MPI庫的程序。只有一個節點發生死鎖!我不使用發送或接收集體操作,但只使用兩個集體功能(MPI_AllreduceMPI_Bcast)。 如果有節點等待其他節點發送或接收,我實際上並不明白是什麼導致了這種死鎖。具有集體功能的MPI死鎖

void ParaStochSimulator::first_reacsimulator() { 
    SimulateSingleRun(); 
} 

double ParaStochSimulator::deterMinTau() { 
    //calcualte minimum tau for this process 
    l_nLocalMinTau = calc_tau(); //min tau for each node 
    MPI_Allreduce(&l_nLocalMinTau, &l_nGlobalMinTau, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD);  
    //min tau for all nodes 
    //check if I have the min value 
    if (l_nLocalMinTau <= l_nGlobalMinTau && m_nCurrentTime < m_nOutputEndPoint) { 
     FireTransition(m_nMinTransPos); 
     CalculateAllHazardValues(); 
    } 
    return l_nGlobalMinTau; 
} 

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
      //std::cout << "size of mani place :" << l_nMinplacesPos.size() << std::endl; 
     } 
    } 
    MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    PostProcessRun(); 
} 

回答

1

當你的「主」進程正在執行MPI_Bcast,所有其他的仍在運行的循環,然後進入deterMinTau,然後執行MPI_Allreduce。

這是一個死鎖,因爲您的主節點正在等待所有節點執行Brodcast,並且所有其他節點正在等待主節點執行Reduce。

我相信你正在尋找的是:

void ParaStochSimulator::SimulateSingleRun() { 
    //prepare a run 
    PrepareRun(); 
    while ((m_nCurrentTime < m_nOutputEndPoint) && IsSimulationRunning()) { 
     //All the nodes reduce tau at the same time 
     deterMinTau(); 
     if (mnprocess_id == 0) { //master 
      SimulateSingleStep(); 
      std::cout << "current time:*****" << m_nCurrentTime << std::endl; 
      broad_casting(m_nMinTransPos); 
      //Removed bordcast for master here 
     } 
     //All the nodes broadcast at every loop iteration 
     MPI_Bcast(&l_anMarking, l_nMinplacesPos.size(), MPI_DOUBLE, 0, MPI_COMM_WORLD); 
    } 
    PostProcessRun(); 
} 
+0

謝謝你的幫助,但不幸的是我已刪除的廣播形成主,仍有死鎖-_- –