Cuda的函數指針

我試圖讓這樣somtehing（其實我需要寫一些集成功能）的CUDACuda的函數指針

enter image description here

我試過，但它並沒有工作 - 它不僅造成。

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) { 
    return x; 
} 

__global__ void tabulate(float lower, float upper, float p_function(float), float*result){ 
    for (lower; lower < upper; lower++) { 
       *result = *result + p_function(lower); 
     } 
} 

int main(){ 
     float res; 
    float* dev_res; 

     cudaMalloc((void**)&dev_res, sizeof(float)) ; 

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res); 
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost) ; 

    printf("%f\n", res); 
    /************************************************************************/ 
    scanf("%s"); 

    return 0; 

}

來源

2013-03-26 DanilGholtsman

您使用什麼卡？您似乎將您的代碼編譯爲計算能力1.x，並且我認爲函數指針是一個計算能力2.x功能。你可以改變你的nvcc調用，使其具有-gencode arch = compute_20，code = sm_20（如果你的卡支持它） – alrikai 2013-03-26 18:30:29

@alrikai GeForce 560Ti – DanilGholtsman 2013-03-26 19:38:40

然後，你應該改變你的編譯從1.x到2.x，這將擺脫你的編譯錯誤。然而，你可能仍然有一些運行時問題... – alrikai 2013-03-26 20:00:00

爲了擺脫編譯錯誤，編譯代碼時必須使用-gencode arch=compute_20,code=sm_20作爲編譯器參數。但你很可能有一些運行時的問題：

從CUDA編程指南採取http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

這樣你就可以有這樣的事情（改編自「FunctionPointers」樣品）：

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float 
typedef unsigned char(*pointFunction_t)(unsigned char, float); 

//some device function to be pointed to 
__device__ unsigned char 
Threshold(unsigned char in, float thresh) 
{ 
    ... 
} 

//pComputeThreshold is a device-side function pointer to your __device__ function 
__device__ pointFunction_t pComputeThreshold = Threshold; 
//the host-side function pointer to your __device__ function 
pointFunction_t h_pointFunction; 

//in host code: copy the function pointers to their host equivalent 
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))

然後，您可以將h_pointFunction作爲參數傳遞給您的內核，該內核可以使用它來調用您的__device__函數。

//your kernel taking your __device__ function pointer as a parameter 
__global__ void kernel(pointFunction_t pPointOperation) 
{ 
    unsigned char tmp; 
    ... 
    tmp = (*pPointOperation)(tmp, 150.0) 
    ... 
} 

//invoke the kernel in host code, passing in your host-side __device__ function pointer 
kernel<<<...>>>(h_pointFunction);

希望這是有道理的。總之，它看起來像你將不得不將f1函數改爲__device__函數並遵循類似的過程（typedefs不是必需的，但它們確實使代碼更好），以將其作爲有效的函數指針主機端傳遞給你的內核。我還建議給FunctionPointers CUDA示例看看

來源

2013-03-26 20:29:45 alrikai

哦，非常感謝你！ – DanilGholtsman 2013-03-26 20:56:29

除了上面的答案（+1）之外，您可能會對NVIDIA論壇中此線程中的設備代碼中使用函數指針（不使用模板）的非常簡單的示例感興趣：https：// devtalk。 nvidia.com/default/topic/457094/how-can-i-use-__device__-function-pointer-in-cuda-/ – njuffa 2013-03-27 00:32:19

@njuffa不錯！你的例子更清潔（並且完整） – alrikai 2013-03-27 00:57:13

即使你可以編譯這段代碼（見@Robert Crovella的回答）這個代碼將無法正常工作。由於主機編譯器無法確定函數地址，因此無法從主機代碼傳遞函數指針。

來源

2013-03-26 18:40:53 Eugene

Cuda的函數指針

回答

相關問題