0
我正在學習CUDA,並仍處於初級階段。我正在嘗試一個簡單的任務,但我的代碼崩潰時,我運行它,我不知道爲什麼。任何幫助,將不勝感激。內核崩潰時嘗試做一個簡單的值分配
編輯:崩潰上cudaMemcpy
和Image
結構中,pixelVal
是int**
類型。這是原因嗎?
原始C++代碼:
void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
int rows = oldImage.N;
int cols = oldImage.M;
Image tempImage(oldImage);
for(int i = 0; i < rows; i++)
{
for(int j = 0; j < cols; j++)
tempImage.pixelVal[rows - (i + 1)][j] = oldImage.pixelVal[i][j];
}
oldImage = tempImage;
}
我的CUDA內核&代碼:
#define NTPB 512
__global__ void fliph(int* a, int* b, int r, int c)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i >= r || j >= c)
return;
a[(r - i * c) + j] = b[i * c + j];
}
void Image::reflectImage(bool flag, Image& oldImage)
/*Reflects the Image based on users input*/
{
int rows = oldImage.N;
int cols = oldImage.M;
Image tempImage(oldImage);
if(flag == true) //horizontal reflection
{
//Allocate device memory
int* dpixels;
int* oldPixels;
int n = rows * cols;
cudaMalloc((void**)&dpixels, n * sizeof(int));
cudaMalloc((void**)&oldPixels, n * sizeof(int));
cudaMemcpy(dpixels, tempImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(oldPixels, oldImage.pixelVal, n * sizeof(int), cudaMemcpyHostToDevice);
int nblks = (n + NTPB - 1)/NTPB;
fliph<<<nblks, NTPB>>>(dpixels, oldPixels, rows, cols);
cudaMemcpy(tempImage.pixelVal, dpixels, n * sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(dpixels);
cudaFree(oldPixels);
}
oldImage = tempImage;
}
您的塊和網格是一維。你爲什麼在內核中使用二維索引。內核中的變量'j'始終爲0。 – sgarizvi 2013-04-04 17:14:18
通過快速審查,代碼看起來沒有問題(除了@ sgar91筆記)。我建議您爲程序提供錯誤檢查以進一步說明問題。看[在](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api)這篇文章。 – stuhlo 2013-04-04 17:25:36
我計算了7個CUDA API調用,並且根本沒有發現錯誤檢查!第一步:檢查錯誤並嘗試縮小問題發生的位置。 – talonmies 2013-04-04 18:03:36