1
請幫幫我。我不明白爲什麼這個功能,使用紋理內存爲什麼在我的情況下,紋理內存比全局要慢
__global__ void corr (int * data)
{ int idx = (blockIdx.y*blockDim.y+threadIdx.y)*64+ (blockIdx.x * blockDim.x + threadIdx.x);
data[idx]=0;
for(int i=0; i<blockDim.y-threadIdx.y; i++)
for(int j=0; j<blockDim.x-threadIdx.x; j++)
data [idx] = data[idx] + tex2D(g_TexRef,blockIdx.x * blockDim.x + threadIdx.x +j, blockIdx.y*blockDim.y+threadIdx.y+i);
工作速度低於此功能的另一個版本,它使用全球內存
__global__ void corr1(int * in , int * data)
{ int idx = (blockIdx.y*blockDim.y+threadIdx.y)*64+ (blockIdx.x * blockDim.x + threadIdx.x);
data[idx]=0;
for(int i=0; i<blockDim.y-threadIdx.y; i++)
for(int j=0; j<blockDim.x-threadIdx.x; j++)
data [idx] = data[idx] +in[(blockIdx.y*blockDim.y+threadIdx.y+i)*64+blockIdx.x * blockDim.x + threadIdx.x +j];
其中計算能力與Google合作?你有沒有嘗試在探查器中運行以查看緩存命中和未命中? – 2012-03-27 16:37:38