2011-11-30 97 views
-1

你好,我試圖在cuda中實現一個集成函數,但我不斷收到內核中的訪問衝突,我只是不明白爲什麼!CUDA集成,訪問衝突

#include <iomanip>  
#include "cuda_runtime.h" 
#include "device_launch_parameters.h" 

#define R 10000 
#define leftBound 1.0 
#define rightBound 3.0 
#define P 10 

#define threads 512 
#define MaxBlocks 65535 

__global__ void cudaKernal(float *M, int x, int leftbound, float width) 
{ 
    unsigned int index = blockIdx.x * threads + threadIdx.x; 
    while(index < x) 
    { 
     int x = leftBound + width*index; 
     M[index] = (float)((exp(-pow((float)x,2))*cos((float)(P*x))) * width); 

     // Next run 
     index += blockDim.x * gridDim.x; 
    } 
} 

int main() 
{  
    float width = (rightBound - leftBound)/R; 
    int x = ceil((rightBound - leftBound)/width); 
    float total = 0; 

    // Trick for celin the total blocks 
    int TotalBlocks = (x+threads)/threads; 
    if(TotalBlocks > MaxBlocks) 
     TotalBlocks = MaxBlocks; 

    float *dev_M; 
    cudaMalloc((void**)&dev_M, x*sizeof(float)); 

    cudaKernal<<<TotalBlocks,threads>>>(dev_M, x, leftBound, width); 

    float *M; 
    cudaMemcpy(M, dev_M, x*sizeof(float), cudaMemcpyDeviceToHost); 
    cudaFree(dev_M); 

    for (int i = 0; i < x; ++i) { 
     printf("M[i]=%f", M[i]); 
     total += M[i]; 
    }  

    printf("The integral is: %f", total); 
    scanf_s("%f",123); 
    return 0; 
} 
+0

請指出發生訪問衝突的確切位置。 –

回答

1

這是因爲你沒有在主機上爲M分配任何內存。這將解決問題:

float *M = (float*)malloc(x*sizeof(float)); 
3

唯一的訪問衝突我在你的代碼中看到的是線:

while(index <= x) 

它不應該是:

while(index < x) 

因爲你到底分配x元素dev_M和索引應該在[0..x-1]

+0

對不起,這是一個錯字,它是index Androme

2

訪問衝突可能是在這裏您的主機代碼:

float *M; 
cudaMemcpy(M, dev_M, x*sizeof(float), cudaMemcpyDeviceToHost); 
cudaFree(dev_M); 

你是做存儲器傳輸到M,但沒有分配任何地方,我可以看到。