2016-12-23 67 views
0

將OpenMP代碼轉換爲TBB有一些困難。有人能幫我嗎?將OpenMP轉換爲TBB

我在OpenMP的,下面的代碼,其中成績不錯

# pragma omp parallel \ 
shared (b, count, count_max, g, r, x_max, x_min, y_max, y_min) \ 
private (i, j, k, x, x1, x2, y, y1, y2) 
{ 
# pragma omp for 

for (i = 0; i < m; i++) 
{ 
for (j = 0; j < n; j++) 
{ 
//cout << omp_get_thread_num() << " thread\n"; 
    x = ((double) ( j - 1) * x_max 
     + (double) (m - j ) * x_min) 
    /(double) (m  - 1); 

    y = ((double) ( i - 1) * y_max 
     + (double) (n - i ) * y_min) 
    /(double) (n  - 1); 

    count[i][j] = 0; 

    x1 = x; 
    y1 = y; 

    for (k = 1; k <= count_max; k++) 
    { 
    x2 = x1 * x1 - y1 * y1 + x; 
    y2 = 2 * x1 * y1 + y; 

    if (x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2) 
    { 
     count[i][j] = k; 
     break; 
    } 
    x1 = x2; 
    y1 = y2; 
    } 

    if ((count[i][j] % 2) == 1) 
    { 
    r[i][j] = 255; 
    g[i][j] = 255; 
    b[i][j] = 255; 
    } 
    else 
    { 
    c = (int) (255.0 * sqrt (sqrt (sqrt ( 
     ((double) (count[i][j])/(double) (count_max)))))); 
    r[i][j] = 3 * c/5; 
    g[i][j] = 3 * c/5; 
    b[i][j] = c; 
    } 
} 
} 
} 

而且TBB版本慢10倍,然後OpenMP的

爲TBB的代碼是:

tbb::parallel_for (int(0), m, [&](int i) 
{ 
for (j = 0; j < n; j++) 
{ 
    x = ((double) ( j - 1) * x_max 
     + (double) (m - j ) * x_min) 
    /(double) (m  - 1); 

    y = ((double) ( i - 1) * y_max 
     + (double) (n - i ) * y_min) 
    /(double) (n  - 1); 

    count[i][j] = 0; 

    x1 = x; 
    y1 = y; 

    for (k = 1; k <= count_max; k++) 
    { 
    x2 = x1 * x1 - y1 * y1 + x; 
    y2 = 2 * x1 * y1 + y; 

    if (x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2) 
    { 
     count[i][j] = k; 
     break; 
    } 
    x1 = x2; 
    y1 = y2; 
    } 

    if ((count[i][j] % 2) == 1) 
    { 
    r[i][j] = 255; 
    g[i][j] = 255; 
    b[i][j] = 255; 
    } 
    else 
    { 
    c = (int) (255.0 * sqrt (sqrt (sqrt ( 
     ((double) (count[i][j])/(double) (count_max)))))); 
    r[i][j] = 3 * c/5; 
    g[i][j] = 3 * c/5; 
    b[i][j] = c; 
    } 
} 
}); 
+0

TBB的默認分區是'auto_partitioner',執行遞歸工作細分到每線程一個循環迭代,這可能會導致巨大的開銷水平。具有許多編譯器的'for'工作共享結構的默認調度是'static',因此,您應該爲''parallel_for'算法提供'static_partitioner'的單例實例,以便在TBB中具有與OpenMP中相同的工作分配。 –

+0

我已經改變了與parallel_for時TBB –

+0

:: parallel_for時(TBB :: blocked_range2d (0,M,O,N) \t,[&](TBB :: blocked_range2d S,static_partitioner()){ \t爲(int i = s.rows()。begin(); i

回答

2

請注意OpenMP版本代碼中的private (i, j, k, x, x1, x2, y, y1, y2)子句。這個變量列表指定了並行循環體內的私有/局部變量。但是,在TBB版本的代碼中,許多這些變量都被lambda作爲參考([&])捕獲,因此代碼不正確。它有種族,而且在我看來,減速是由於從多個線程訪問這些變量(緩存一致性開銷和循環索引混亂)造成的。因此,如果您想修復代碼,請將這些變量置於本地,例如

tbb::parallel_for (int(0), m, [&](int i) 
{ 
double x, y, x1, x2, y1, y2; // !!!! 
int j, k;     // !!!! 
for (j = 0; j < n; j++) 
{ 
    x = ((double) ( j - 1) * x_max 
     + (double) (m - j ) * x_min) 
    /(double) (m  - 1); 
...