並行執行花費的時間多於串行？

我正在研究TBB中的任務實現，並運行斐波那契數列的並行和串行計算代碼。並行執行花費的時間多於串行？

的代碼是：相比於串行execution.In此並行執行了2500秒，而串行前後花了167秒

#include <iostream> 
#include <list> 
#include <tbb/task.h> 
#include <tbb/task_group.h> 
#include <stdlib.h> 
#include "tbb/compat/thread" 
#include "tbb/task_scheduler_init.h" 
using namespace std; 
using namespace tbb; 

#define CutOff 2 

long serialFib(long n) { 
if(n<2) 
return n; 
else 
return serialFib(n-1) + serialFib(n-2); 
} 


class FibTask: public task 
{ 
    public: 
    const long n; 
    long* const sum; 

    FibTask(long n_, long* sum_) : n(n_), sum(sum_) {} 

    task* execute() 
    { 
     // cout<<"task id of thread is \t"<<this_thread::get_id()<<"FibTask(n)="<<n<<endl; // Overrides virtual function task::execute  
       // cout<<"Task Stolen is"<<is_stolen_task()<<endl; 
     if(n<CutOff) 
     { 
      *sum = serialFib(n); 
     } 
     else 
     { 
      long x, y; 
      FibTask& a = *new(allocate_child()) FibTask(n-1,&x); 
      FibTask& b = *new(allocate_child()) FibTask(n-2,&y); 
      set_ref_count(3); // 3 = 2 children + 1 for wait // ref_countis used to keep track of the number of tasks spawned at       the current level of the task graph 
      spawn(b); 
         // cout<<"child id of thread is \t"<<this_thread::get_id()<<"calculating n ="<<n<<endl; 
      spawn_and_wait_for_all(a); //set tasks for execution and wait for them 
      *sum = x+y; 
     } 
     return NULL; 
    } 
}; 


long parallelFib(long n) 
{ 
    long sum; 
    FibTask& a = *new(task::allocate_root()) FibTask(n,&sum); 
    task::spawn_root_and_wait(a); 
    return sum; 
} 


int main() 
{  
    long i,j; 
    cout<<fixed; 

    cout<<"Fibonacci Series parallelly formed is "<<endl; 
     tick_count t0=tick_count::now(); 
    for(i=0;i<50;i++) 
    cout<<parallelFib(i)<<"\t"; 
    // cout<<"parallel execution of Fibonacci series for n=10 \t"<<parallelFib(i)<<endl; 

    tick_count t1=tick_count::now(); 
    double t=(t1-t0).seconds(); 
    cout<<"Time Elapsed in Parallel Execution is \t"<<t<<endl; 
    cout<<"\n Fibonacci Series Serially formed is "<<endl; 
    tick_count t3=tick_count::now(); 

    for(j=0;j<50;j++) 
    cout<<serialFib(j)<<"\t"; 
    tick_count t4=tick_count::now(); 
    double t5=(t4-t3).seconds(); 
    cout<<"Time Elapsed in Serial Execution is \t"<<t5<<endl; 
    return(0); 
}

並行執行花費更多的時間。有人可以解釋這個原因嗎？

來源

2013-03-14 Jasdeep Singh Arora

開銷。

當您的實際任務很輕時，協調/通信占主導地位，您不會（自動）從並行執行中獲益。這是一個很常見的問題。

嘗試改爲連續計算M個斐波那契數（成本足夠高），然後並行計算它們。你應該看到收益。

來源

2013-03-14 14:31:29

沒有更多的信息，這將是很難說。你需要檢查：你的電腦有多少個processros？有沒有其他可能使用其他處理器的程序？（真）並行運行並獲得性能優勢，則操作系統必須能夠分配至少2個空閒處理器。另外，對於小任務，分配線程和收集其結果的開銷可能會超出並行執行的好處。

來源

2013-03-14 14:32:54

其四核處理器i3 ...是的其他應用程序也在運行 – 2013-03-15 06:39:43

我是否認爲每個任務的確都是result of fib(n-1) + result of fib(n-2) - 因此，基本上，你開始一個任務，然後開始另一個任務，等等，直到我們有很多任務（我略微失去了嘗試數數它們全部 - 我認爲這是平方）。每個這樣的任務的結果是用來加起來斐波納契數。

首先，這裏沒有實際的並行執行（除了可能的兩個獨立的遞歸計算）。每項任務都依賴於其子任務的結果，並不能真正做任何事情。另一方面，您正在執行大量工作來設置每項任務。毫不奇怪，你沒有看到任何好處）

現在，如果你要通過迭代計算斐波那契數1..50，並且你開始在系統中爲每個處理器核心啓動一個任務，並且與僅使用單個循環的迭代解決方案相比，我相信這將顯示出更好的改進。

來源

2013-03-14 14:35:05

將截止值更改爲12，在（-O在Linux上; /在Windows上爲O2）上進行優化編譯，您應該看到顯着的加速。

該示例中有很多並行性。問題是，在Cutoff = 2時，有用並行計算的各個單元被調度開銷所淹沒。提高截止值應該可以解決問題。

以下是分析。分析並行性有兩個重要時間：

工作 - 計算工作的總量。
span - 關鍵路徑的長度。

可用的並行性是work/span。對於fib（n），當n足夠大時，工作大致與fib（n）成比例[是的，它描述自己！]。跨度是調用樹的深度 - 它大致與n成正比。所以平行度與fib（n）/ n成正比。因此，即使n = 10，也有足夠的可用並行機制來保持典型的2013年臺式機嗡嗡聲。

問題是TBB任務需要時間來創建，執行，同步和銷燬。將工作截止值從2更改爲12時，可以在工作量很小時調度串行代碼，以減少調度開銷。這是遞歸併行性中的一種常見模式：並行遞歸，直到您完成可能連續完成的大量工作。在其他並行框架（如OpenMP或Cilk Plus）中也存在相同的問題：雖然可能多於或少於TBB，但任務會有開銷。所有這些變化都是最好的閾值。

嘗試變化截止。較低的值應該會給你更多的並行性，但會增加調度開銷。較高的值可以減少並行性，但可以減少調度開銷。在這之間，你可能會發現一系列值得加速的值。

來源

2013-03-15 01:35:39

並行執行花費的時間多於串行？

回答

相關問題