2013-03-26 82 views
3

我試圖讓這樣somtehing(其實我需要寫一些集成功能)的CUDACuda的函數指針

enter image description here

我試過,但它並沒有工作 - 它不僅造成。

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) { 
    return x; 
} 

__global__ void tabulate(float lower, float upper, float p_function(float), float*result){ 
    for (lower; lower < upper; lower++) { 
       *result = *result + p_function(lower); 
     } 
} 

int main(){ 
     float res; 
    float* dev_res; 

     cudaMalloc((void**)&dev_res, sizeof(float)) ; 

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res); 
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost) ; 

    printf("%f\n", res); 
    /************************************************************************/ 
    scanf("%s"); 

    return 0; 

} 
+0

您使用什麼卡?您似乎將您的代碼編譯爲計算能力1.x,並且我認爲函數指針是一個計算能力2.x功能。你可以改變你的nvcc調用,使其具有-gencode arch = compute_20,code = sm_20(如果你的卡支持它) – alrikai 2013-03-26 18:30:29

+0

@alrikai GeForce 560Ti – DanilGholtsman 2013-03-26 19:38:40

+0

然後,你應該改變你的編譯從1.x到2.x,這將擺脫你的編譯錯誤。然而,你可能仍然有一些運行時問題... – alrikai 2013-03-26 20:00:00

回答

7

爲了擺脫編譯錯誤,編譯代碼時必須使用-gencode arch=compute_20,code=sm_20作爲編譯器參數。但你很可能有一些運行時的問題:

從CUDA編程指南採取http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

這樣你就可以有這樣的事情(改編自「FunctionPointers」樣品):

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float 
typedef unsigned char(*pointFunction_t)(unsigned char, float); 

//some device function to be pointed to 
__device__ unsigned char 
Threshold(unsigned char in, float thresh) 
{ 
    ... 
} 

//pComputeThreshold is a device-side function pointer to your __device__ function 
__device__ pointFunction_t pComputeThreshold = Threshold; 
//the host-side function pointer to your __device__ function 
pointFunction_t h_pointFunction; 

//in host code: copy the function pointers to their host equivalent 
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t)) 

然後,您可以將h_pointFunction作爲參數傳遞給您的內核,該內核可以使用它來調用您的__device__函數。

//your kernel taking your __device__ function pointer as a parameter 
__global__ void kernel(pointFunction_t pPointOperation) 
{ 
    unsigned char tmp; 
    ... 
    tmp = (*pPointOperation)(tmp, 150.0) 
    ... 
} 

//invoke the kernel in host code, passing in your host-side __device__ function pointer 
kernel<<<...>>>(h_pointFunction); 

希望這是有道理的。總之,它看起來像你將不得不將f1函數改爲__device__函數並遵循類似的過程(typedefs不是必需的,但它們確實使代碼更好),以將其作爲有效的函數指針主機端傳遞給你的內核。我還建議給FunctionPointers CUDA示例看看

+0

哦,非常感謝你! – DanilGholtsman 2013-03-26 20:56:29

+0

除了上面的答案(+1)之外,您可能會對NVIDIA論壇中此線程中的設備代碼中使用函數指針(不使用模板)的非常簡單的示例感興趣:https:// devtalk。 nvidia.com/default/topic/457094/how-can-i-use-__device__-function-pointer-in-cuda-/ – njuffa 2013-03-27 00:32:19

+0

@njuffa不錯!你的例子更清潔(並且完整) – alrikai 2013-03-27 00:57:13

1

即使你可以編譯這段代碼(見@Robert Crovella的回答)這個代碼將無法正常工作。由於主機編譯器無法確定函數地址,因此無法從主機代碼傳遞函數指針。