如何從OpenCL代碼啓動另一個線程？

數據生成。在這一步中，我生成循環中的數據數組，作爲一些函數結果
數據處理。對於這一步，我編寫了處理在上一步中生成的數據數組的OpenCL內核。

現在第一步運行在CPU上，因爲它很難並行化。我想在GPU上運行它，因爲每一代都需要一些時間。我想立即爲已經生成的數據運行第二步。

我可以從當前運行的內核運行另一個opencl內核在單獨的線程中嗎？或者它在調用內核的某個線程中運行？

一些僞碼，說明我的觀點：

__kernel second(__global int * data, int index) { 
    //work on data[i]. This process takes a lot of time 
} 

__kernel first(__global int * data, const int length) { 
    for (int i = 0; i < length; i++) { 
     // generate data and store it in data[i] 

     // This kernel will be launched in some thread that caller or in new thread? 
     // If in same thread, there are ways to launch it in separated thread? 
     second(data, i); 
    } 
}

來源

2011-03-15 Eugene Burtsev