0
我需要對圖像進行下采樣。有些閱讀暗示說如果我使用紋理存儲器,那麼這個函數是免費的,速度更快(我正在尋找雙線性插值)。有人告訴我?正是爲此編寫內核這是我目前有: (我用的(1,1)的線程塊)使用紋理存儲器對圖像進行下采樣
__global__ void texturekernel(int * final_red){
int f = (blockIdx.x * blockDim.x) + threadIdx.x;
int c = (blockIdx.y * blockDim.y) + threadIdx.y;
int id=blockIdx.x+256*blockIdx.y;//256 is the width of downsampled image ..original was 512
final_red[id]=tex2D(refTexture,c+0.5f,f+0.5f);//This is just for the red channel
//where reftexture is defined as texture <float, 2, cudaReadModeElementType> refTexture;
};
此版本目前給我的全0輸出。
編輯(在這個版本中我試圖下采樣2倍2000 * 512大小的圖像分爲2 1000 * 256):
texture <float, 2, cudaReadModeElementType> refTexture; // global variable !
cudaArray* myArray;
cudaChannelFormatDesc description = cudaCreateChannelDesc<float>();
cudaError rs=cudaMallocArray ( &myArray,&description, 512,2000*2);//
//This line below is part of loop where input image is read row by row ..rowchecker keeps track of the row
cudaMemcpyToArray(myArray,0,rowchecker++,array_temp_red,sizeof(int)*test_columns,cudaMemcpHostToDevice);
refTexture.normalized=false;
refTexture.addressMode[0]=cudaAddressModeClamp;
refTexture.addressMode[1]=cudaAddressModeClamp;
refTexture.filterMode=cudaFilterModePoint;
cudaBindTextureToArray(refTexture,myArray);
dim3 blockSize(1,1);
int n_blocks_x=256;
int n_blocks_y=1000*2;
dim3 gridSize(n_blocks_x,n_blocks_y);
cudaMalloc((void**)&finalarray,(2000)*(512)*2/4*sizeof(int));
texturekernel<<<gridSize,blockSize>>>(finalarray);
感謝Jawad.I試過了,我想問題不在於最終的數組,但與呼叫tex2d事情。只是爲了檢查我試圖做float xx = tex2D(reftexture,f,c),它給了我一些垃圾值。我正確地將值綁定到數組,然後爲什麼會發生這種情況?任何線索? – Manish 2011-02-28 07:11:53
你可以公開完整的代碼,特別是紋理內存的綁定。 – jwdmsd 2011-02-28 07:27:01
嘗試使用簡單內存而不是數組,並使用cudaBindTexture2D進行綁定,這將幫助您更好地進行可視化。我將通讀代碼並嘗試找出問題所在。 – jwdmsd 2011-02-28 08:56:30