將工作劃分爲更多線程需要更多時間，爲什麼？

我有一個小型的C程序，它使用monte-carlo計算pi-這個模擬基本上只是測試一個隨機點[x，y]，如果它位於一個圓內或圓外。將工作劃分爲更多線程需要更多時間，爲什麼？

爲了接近PI我不得不使用大量的樣品Ñ的，其具有的O（n）的一個成正比的複雜性。所以試圖計算大量的樣本n，我實現了POSIX threads api來平行計算能力。

我的代碼如下所示：

pthread_t worker[nthreads]; /* creates workers for each thread */ 
struct param aparam[nthreads]; /* struct param{ long* hits; long rounds; }; */ 
long nrounds = nsamples/nthreads; /* divide samples to subsets of equal rounds per thread */ 

for (int i = 0; i < nthreads; ++i) { /* loop to create threads */ 
    aparam[i].hits = 0; 
    aparam[i].rounds = nrounds; 
    pthread_create(&worker[i], NULL, calc_pi, &aparam[i]); /* calls calc_pi(void* vparam){} */ 
} 

long nhits = 0; 
for (int j = 0; j < nthreads; ++j) { /* collects results */ 
    pthread_join(worker[j], NULL); 
    nhits += (long)aparam[j].hits; /* counts hits inside the cicrle */ 
}

而這就是每個線程正在做：

void* calc_pi(void* vparam) 
{ /* counts hits inside a circle */ 
    struct param *iparam; 
    iparam = (struct param *) vparam; 
    long hits = 0; 
    float x, y, z; 
    for (long i = 0; i < iparam->rounds; ++i) { 
     x = (float)rand()/RAND_MAX; 
     y = (float)rand()/RAND_MAX; 
     z = x * x + y * y; 
     if (z <= 1.f) /* circle radius of 1 */ 
      ++hits; 
    } 
    iparam->hits = (long*)hits; 
    return NULL; 
}

現在我有一個奇怪的觀察。使用相同的樣本集n並且具有越來越多的線程i該程序需要更多的時間而不是更少的時間。

這裏有一些平均運行時間（可再現的）：

------------------------------------------------- 
| Threads[1] | Samples[1] | Rounds[1] | Time[s] | 
------------------------------------------------- 
|  32 | 268435456 | 8388608 | 118 | 
|  16 | 268435456 | 16777216 | 106 | 
|   8 | 268435456 | 33554432 | 125 | 
|   4 | 268435456 | 67108864 | 152 | 
|   2 | 268435456 | 134217728 |  36 | 
|   1 | 268435456 | 268435456 |  15 | 
-------------------------------------------------

爲什麼例如兩個線程做同樣的工作時間比兩倍的時間比一個單獨的線程嗎？我的假設是，將工作劃分的兩個線程應該將時間減少至少50％。

與GCC 4.9.1編譯和以下標誌：

gcc -O2 -std=gnu11 -pthread pipa.c -lpthread -o pipa

我的硬件是一個雙Intel Xeon E5520（2個處理器，每4個內核）@ 2.26 GHz的超線程禁用，運行Linux的科學與2.6 .18內核。

任何想法？

來源

2014-11-22 default

Linux 2.6.18是古老的。像，史前。我很肯定自那時起對多線程程序進行了很多改進。例如，你使用哪種pthreads-implementation？ LinuxThreads或NPTL？ – EOF 2014-11-22 11:16:47

你確定線程在不同的內核上運行嗎？也許由於某些原因，他們共享相同的內核，所以上下文切換開銷會增加rand（）可能導致爭用的運行時間 – 2014-11-22 11:18:21

。 – 2501 2014-11-22 11:24:36

您的線程執行的最昂貴的操作是呼叫rand()。 rand()是一個樸素，簡單，並且一般不具有MT可擴展功能（因爲它保證了相同的種子產生相同的序列隨機數字）。（*）

一個簡單的技巧，以確認是否是問題，是在調試器下啓動程序，然後，幾次：暫停它，捕獲線程的堆棧跟蹤繼續。無論堆棧軌跡中最常出現的是什麼，很可能是瓶頸。（*）使它更慢的原因是鎖定爭用會導致額外的性能損失。此外，許多線程還會增加進程調度和上下文切換的額外開銷。

來源

2014-11-22 11:52:16 Dummy00001

事實上，使用'rand_r（）'解決了這個問題。 – default 2014-11-22 12:03:21

將工作劃分爲更多線程需要更多時間，爲什麼？

回答

相關問題