2017-07-01 69 views
0

產生錯誤內核代碼:CL_OUT_OF_RESOURCES錯誤

__kernel void testDynamic(__global int *data) 
{ 
    int id=get_global_id(0); 
    atomic_add(&data[1],2); 
} 

__kernel void test(__global int * data) 
{ 
    int id=get_global_id(0); 
    atomic_add(&data[0],2); 
    if (id == 0) { 
     queue_t q = get_default_queue(); 
     ndrange_t ndrange = ndrange_1D(1,1); 
     void (^my_block_A)(void) = ^{testDynamic(data);}; 
     enqueue_kernel(q, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, 
         ndrange, 
         my_block_A); 
    } 

} 

我測試下面的代碼,以確保的OpenCL 2.0的編譯器正在工作。

__kernel void test2(__global int *data) 
{ 
    int id=get_global_id(0); 
    data[id]=work_group_scan_inclusive_add(id); 
} 

掃描功能給出0,1,3,6作爲輸出,所以OpenCL 2.0減少功能正在工作。

動態並行是OpenCL 2.0的擴展嗎?如果我刪除了enqueue_kernel命令,結果與期望值相等(省略子核)。

設備:AMD RX550,驅動程序:17.6.2

是否有需要在主機端運行,在get_default_queue隊列中運行孩子內核一個特殊的命令?現在,命令隊列與下面在OpenCL 1.2的方式創建:

commandQueue = cl::CommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &err); 

是否get_default_queue()必須是同一個命令隊列,其調用父內核?提出這個問題是因爲我使用相同的命令隊列將數據上傳到GPU,然後下載結果,只需一次同步。

回答

0
從問題

感動的解決方案來回答:

Edit: below API command was the solution:

commandQueue = cl::CommandQueue(context, device, 
    CL_QUEUE_ON_DEVICE| 
    CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | 
    CL_QUEUE_ON_DEVICE_DEFAULT, &err); 

after creating this queue(only 1 per device), didn't use it for anything else and also the parent kernel is enqueued on any other host queue so it looks like get_default_queue() doesn't have to be the parent-calling queue.

Documentation says CL_INVALID_QUEUE_PROPERTIES will be thrown if CL_QUEUE_ON_DEVICE is specified but for my machine, dynamic parallelism works with it and doesn't throw that error(as the upper commandQueue constructor parameters).