我的測試函數是這樣的。在CUDA中如何有效地執行這個內核
DIMENSION 20
POPSIZE 5000
__global__ void repairT(int* H, int* diff){
int tidx = blockDim.x * blockIdx.x + threadIdx.x;
int ii = tidx * DIMENSION;
//if (ii < DIMENSION * POPSIZE)
//{
int Hdiff[DIMENSION] = { 0 };
int diffcount = 0;
bool isInIndiv = false;
//complement set H
for (int i = 1; i <= DIMENSION; i++)
{
for (int j = ii; j < ii + DIMENSION; j++) //H for
{
if (i == H[j])
{
isInIndiv = isInIndiv || true;
}
}
if (isInIndiv == false)
{
Hdiff[diffcount] = i;
diffcount++;
}
else
isInIndiv = false;
}
// diff to array
int diffc = ii * DIMENSION;
for (int i = 0; i < DIMENSION; i++)
{
diff[diffc] = Hdiff[i];
diffc++;
}
//}
}
我有很大的一維數組叫做H(POPSIZE * DIMENSION)。我想創建新的數組差異,它保存間隔0-19,20-39等缺失的元素...
我需要在parralel中有效執行此代碼5000次 我試過這個,但它只執行對於區間0-19在H
dim3 nbThreadsR1(128);
dim3 nbBlocksR1((POPSIZE/nbThreadsR1.x) + 1);
repairT << <nbBlocksR1, nbThreadsR1 >> >(d_H, d_diff);
請給我一些建議。
不,我從ii的聲明中刪除DIMENSION,但現在內核有時做錯誤的計算,但它適用於所有陣列 –