將結構傳遞給CUDA內核

我是CUDA C的新手，正在嘗試將typedef'd結構傳遞給內核。我的方法工作得很好，當我嘗試一個只包含int的結構時，但當我切換到浮動時，我得到了無意義的數字作爲結果。我認爲這與對齊有關，並且我嘗試了包括__align__以及我的類型聲明，但無濟於事。有人能給我舉例說明這是如何完成的，或者提供一種替代方法？我試圖設置它，以便我可以輕鬆地添加或刪除字段，而無需更改結構和內核以外的任何其他字段。我的代碼：將結構傳遞給CUDA內核

typedef struct __align__(8) 
{ 
    float a, b; 
} point; 

__global__ void testKernel(point *p) 
{ 
    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    p[i].a = 1.1; 
    p[i].b = 2.2; 
} 

int main(void) 
{ 
     // set number of points 
    int numPoints = 16, 
     gpuBlockSize = 4, 
     pointSize = sizeof(point), 
     numBytes  = numPoints * pointSize, 
     gpuGridSize = numPoints/gpuBlockSize; 

     // allocate memory 
    point *cpuPointArray = new point[numPoints], 
      *gpuPointArray = new point[numPoints]; 
    cpuPointArray = (point*)malloc(numBytes); 
    cudaMalloc((void**)&gpuPointArray, numBytes); 

     // launch kernel 
    testKernel<<<gpuGridSize,gpuBlockSize>>>(gpuPointArray); 

     // retrieve the results 
    cudaMemcpy(cpuPointArray, gpuPointArray, numBytes, cudaMemcpyDeviceToHost); 
    printf("testKernel results:\n"); 
    for(int i = 0; i < numPoints; ++i) 
    { 
     printf("point.a: %d, point.b: %d\n",cpuPointArray[i].a,cpuPointArray[i].b); 
    } 

     // deallocate memory 
    free(cpuPointArray); 
    cudaFree(gpuPointArray); 

    return 0; 
}

來源

2010-11-14 Paul

point * gpuPointArray = new ...對我來說似乎不對嗎？你在主機上分配，然後在設備上做一個cudaMalloc ... – Bart 2010-11-14 08:41:26

在將它作爲參數傳遞給內核之前，我不需要分配內存嗎？將cudaMalloc行退出會導致「未指定的啓動失敗」。我也可以將gpuPointArray設置爲NULL，但它似乎沒有改變我的原始結果。 – Paul 2010-11-14 08:56:32

當然。你需要cudaMalloc。儘管如此，你並不需要「新」。 cpuPointArray也一樣。使用malloc和free（你正在編程C），不要在這裏使用新的。（從來沒有混合新的malloc刪除和免費） – Bart 2010-11-14 09:02:48

看看它是如何在你的CUDA include目錄下的vector_types.h頭文件中完成的。這應該已經給你一些指示。

但是，這裏的主要問題是您撥打printf時的%d。你正在嘗試打印浮動，而不是整數。所以那些真的應該使用%f來代替。

來源

2010-11-14 08:52:58 Bart

好吧，我看了vector_types.h，我試着做他們做的：typedef struct __align __（2 * sizeof（float））point {'...，但它仍然會產生相同的結果。這裏還有別的東西，我應該看到嗎？ – Paul 2010-11-14 09:13:33

順便說一句，改變你的printf使用％f而不是％d ...這會改變什麼嗎？你正在嘗試打印漂浮物，而不是ints ... – Bart 2010-11-14 09:32:47

哈！這樣做，謝謝。有時明顯是最容易錯過的東西... – Paul 2010-11-14 09:40:51

由於似乎沒有任何體面的文件說明如何做到這一點，我想我會在這裏發佈最終修訂的代碼。事實證明，__align__部分也是不必要的，實際的問題是在嘗試打印浮動元素時，printf中使用％d。

#include <stdlib.h> 
#include <stdio.h> 

typedef struct 
{ 
    float a, b; 
} point; 

__global__ void testKernel(point *p) 
{ 
    int i = blockIdx.x * blockDim.x + threadIdx.x; 
    p[i].a = 1.1; 
    p[i].b = 2.2; 
} 

int main(void) 
{ 
     // set number of points 
    int numPoints = 16, 
     gpuBlockSize = 4, 
     pointSize = sizeof(point), 
     numBytes  = numPoints * pointSize, 
     gpuGridSize = numPoints/gpuBlockSize; 

     // allocate memory 
    point *cpuPointArray, 
      *gpuPointArray; 
    cpuPointArray = (point*)malloc(numBytes); 
    cudaMalloc((void**)&gpuPointArray, numBytes); 

     // launch kernel 
    testKernel<<<gpuGridSize,gpuBlockSize>>>(gpuPointArray); 

     // retrieve the results 
    cudaMemcpy(cpuPointArray, gpuPointArray, numBytes, cudaMemcpyDeviceToHost); 
    printf("testKernel results:\n"); 
    for(int i = 0; i < numPoints; ++i) 
    { 
     printf("point.a: %f, point.b: %f\n",cpuPointArray[i].a,cpuPointArray[i].b); 
    } 

     // deallocate memory 
    free(cpuPointArray); 
    cudaFree(gpuPointArray); 

    return 0; 
}

來源

2010-11-14 10:14:37 Paul

將結構傳遞給CUDA內核

回答

相關問題