2016-11-08 190 views
0

我嘗試使用OpenMP減少並行化下面的循環;使用Eigen :: VectorXd減少OpenMP OpenDMP:

#define EIGEN_DONT_PARALLELIZE 
#include <iostream> 
#include <cmath> 
#include <string> 
#include <eigen3/Eigen/Dense> 
#include <eigen3/Eigen/Eigenvalues> 
#include <omp.h> 

using namespace Eigen; 
using namespace std; 

VectorXd integrand(double E) 
{ 
    VectorXd answer(500000); 
    double f = 5.*E + 32.*E*E*E*E; 
    for (int j = 0; j !=50; j++) 
     answer[j] =j*f; 
    return answer; 
} 

int main() 
{ 
    omp_set_num_threads(4); 
    double start = 0.; 
    double end = 1.; 
    int n = 100; 
    double h = (end - start)/(2.*n); 

    VectorXd result(500000); 
    result.fill(0.); 
    double E = start; 
    result = integrand(E); 
    #pragma omp parallel 
    { 
    #pragma omp for nowait 
    for (int j = 1; j <= n; j++){ 
     E = start + (2*j - 1.)*h; 
     result = result + 4.*integrand(E); 
     if (j != n){ 
      E = start + 2*j*h; 
      result = result + 2.*integrand(E); 
     } 
    } 
    } 
    for (int i=0; i <50 ; ++i) 
     cout<< i+1 << " , "<< result[i] << endl; 

    return 0; 
} 

這絕對是平行比沒有更快,但所有4個線程,結果是巨大的變化。當線程數設置爲1時,輸出是正確的。 我將不勝感激,如果有人可以幫助我這個...

我使用編譯標誌的鐺編譯器;

clang++-3.8 energy_integration.cpp -fopenmp=libiomp5 

如果這是一個半身像,然後我就必須要學會執行Boost::thread,或std::thread ...

+0

添加'FIRSTPRIVATE(PARAMS)減少(+:result_int)'你'parallel'指令,刪除'critical',然後再試一次... – Gilles

+0

@Gilles感謝您的答覆。我已編輯我的代碼使第一個'#pragma'語句讀取'#pragma omp parallel firstprivate(params)reduction(+:result_int)',第二個'#pragma'語句保持不變,並且所有後續的'#pragma'語句被刪除。然後程序產生運行時錯誤:'.... const Eigen :: Matrix >:Assertion aLhs.rows()== aRhs.rows()&& aLhs.cols()== aRhs.cols()'失敗。 Aborted' - 我可以保證kspace和result_int都具有相同數量的元素和維度 – AlexD

+1

您能否將您的示例變爲完整[mcve]?另外,串行版本是否按預期工作? –

回答

2

您的代碼不會定義自定義還原爲OpenMP的減少徵對象。我不確定clang是否支持用戶定義的縮小(請參閱OpenMP 4 spec,第180頁)。如果是這樣,您可以聲明減價並將reduction(+:result)添加到#pragma omp for一行。如果沒有,你可以通過改變你的代碼如下自己做:

VectorXd result(500000); // This is the final result, not used by the threads 
result.fill(0.); 
double E = start; 
result = integrand(E); 
#pragma omp parallel 
{ 
    // This is a private copy per thread. This resolves race conditions between threads 
    VectorXd resultPrivate(500000); 
    resultPrivate.fill(0.); 
#pragma omp for nowait// reduction(+:result) // Assuming user-defined reductions aren't allowed 
    for (int j = 1; j <= n; j++) { 
     E = start + (2 * j - 1.)*h; 
     resultPrivate = resultPrivate + 4.*integrand(E); 
     if (j != n) { 
      E = start + 2 * j*h; 
      resultPrivate = resultPrivate + 2.*integrand(E); 
     } 
    } 
#pragma omp critical 
    { 
     // Here we sum the results of each thread one at a time 
     result += resultPrivate; 
    } 
} 

你得到(in your comment)的錯誤似乎是由於大小不匹配。雖然代碼本身並不重要,但不要忘記,當OpenMP啓動每個線程時,它必須初始化每個線程的專用VectorXd。如果沒有提供,則默認爲VectorXd()(大小爲零)。當使用這個對象時,發生尺寸不匹配。的omp declare reduction一個「正確」的使用將包括初始化部分:

#pragma omp declare reduction (+: VectorXd: omp_out=omp_out+omp_in)\ 
    initializer(omp_priv=VectorXd::Zero(omp_orig.size())) 

omp_priv是私有變量的名稱。它由VectorXd::Zero(...)初始化。尺寸使用omp_orig指定。標準 (第182頁,25-27行)定義此爲:

The special identifier omp_orig can also appear in the initializer-clause and it will refer to the storage of the original variable to be reduced.

在我們的情況下(見下文完整的例子),這是result。所以result.size()是500000,私有變量被初始化爲正確的大小。

#include <iostream> 
#include <string> 
#include <Eigen/Core> 
#include <omp.h> 

using namespace Eigen; 
using namespace std; 

VectorXd integrand(double E) 
{ 
    VectorXd answer(500000); 
    double f = 5.*E + 32.*E*E*E*E; 
    for (int j = 0; j != 50; j++) answer[j] = j*f; 
    return answer; 
} 

#pragma omp declare reduction (+: Eigen::VectorXd: omp_out=omp_out+omp_in)\ 
    initializer(omp_priv=VectorXd::Zero(omp_orig.size())) 

int main() 
{ 
    omp_set_num_threads(4); 
    double start = 0.; 
    double end = 1.; 
    int n = 100; 
    double h = (end - start)/(2.*n); 

    VectorXd result(500000); 
    result.fill(0.); 
    double E = start; 
    result = integrand(E); 

#pragma omp parallel for reduction(+:result) 
    for (int j = 1; j <= n; j++) { 
     E = start + (2 * j - 1.)*h; 
     result += (4.*integrand(E)).eval(); 
     if (j != n) { 
      E = start + 2 * j*h; 
      result += (2.*integrand(E)).eval(); 
     } 
    } 
    for (int i = 0; i < 50; ++i) 
     cout << i + 1 << " , " << result[i] << endl; 

    return 0; 
} 
+0

非常好,謝謝。這讓我有2線程的速度增加了2.17倍。用戶定義的減少對我來說並不奏效,但運行時錯誤讓我懷疑它是否與Eigen而不是Clang相關。編輯只是用g ++試過這個,它甚至沒有編譯''結果''有'無效'類型' – AlexD

+0

發生了什麼運行時錯誤?用什麼代碼?也可能是Eigen類型與omp不搭配,我沒有嘗試過。 –

+0

我認爲你對Eigen和omp是正確的。我已經在用戶定義的縮減發佈的代碼上對此進行了測試。即使只設置了一個線程,運行時錯誤也是一樣的,摘錄顯示爲輸出太長:'[BinaryOp = Eigen :: internal :: scalar_sum_op ,Lhs = const Eigen :: Matrix ,Rhs = const Eigen :: CwiseUnaryOp ,const Eigen :: Matrix >] :Assertion \'aLhs.rows()== aRhs.rows()&& aLhs.cols()== aRhs.cols()'失敗。我中止了'''''''''''''''''''''''''''我急於補充,您勾畫的另一種方法是一種享受 – AlexD