第二次迭代崩潰 - 不相關的訂單

爲了節省全局內存傳輸，並且由於代碼的所有步驟都單獨工作，我試圖將所有的kernals合併成一個內核，其中前2個（3個）正在執行的步驟爲設備調用而不是全局調用。這是在第一步的後半部分失敗。第二次迭代崩潰 - 不相關的訂單

有一個功能，我需要調用兩次，來計算圖像的兩半。無論計算圖像的順序如何，它都會在第二次迭代時崩潰。

在仔細檢查代碼並用不同的返回點運行多次後，我發現是什麼讓它崩潰。

__device__ 
void IntersectCone(float* ModDistance, 
       float* ModIntensity, 
       float3 ray, 
       int threadID, 
       modParam param) 
{ 

bool ignore = false; 

float3 normal = make_float3(0.0f,0.0f,0.0f); 
float3 result = make_float3(0.0f,0.0f,0.0f); 
float normDist = 0.0f; 
float intensity = 0.0f; 

float check = abs(Dot(param.position, Cross(param.direction,ray))); 
if(check > param.r1 && check > param.r2) 
    ignore = true; 

float tran = param.length/(param.r2/param.r1 - 1); 
float length = tran + param.length; 
float Lsq = length * length; 
float cosSqr = Lsq/(Lsq + param.r2 * param.r2); 

//Changes the centre position? 
float3 position = param.position - tran * param.direction; 

float aDd = Dot(param.direction, ray); 
float3 e = position * -1.0f; 
float aDe = Dot(param.direction, e); 
float dDe = Dot(ray, e); 
float eDe = Dot(e, e); 
float c2 = aDd * aDd - cosSqr; 
float c1 = aDd * aDe - cosSqr * dDe; 
float c0 = aDe * aDe - cosSqr * eDe; 

float discr = c1 * c1 - c0 * c2; 

if(discr <= 0.0f) 
    ignore = true; 

if(!ignore) 
{ 
    float root = sqrt(discr); 
    float sign; 

    if(c1 > 0.0f) 
     sign = 1.0f; 
    else 
     sign = -1.0f; 

    //Try opposite sign....? 
    float3 result = (-c1 + sign * root) * ray/c2; 


    e = result - position; 
    float dot = Dot(e, param.direction);   
    float3 s1 = Cross(e, param.direction);   
    float3 normal = Cross(e, s1); 

    if((dot > tran) || (dot < length)) 
    { 
     if(Dot(normal,ray) <= 0) 
     { 
      normal = Norm(normal); //This stuff (1) 
      normDist = Magnitude(result); 
      intensity = -IntensAt1m * Dot(ray, normal)/(normDist * normDist); 
     } 
    } 
} 
ModDistance[threadID] = normDist; and this stuff (2) 
ModIntensity[threadID] = intensity; 
}

有兩件事情我可以做，使這個不出事，都關閉其否定功能點：如果我不嘗試寫入ModDistance []和ModIntensity []，或者如果我不要寫正常的程度和強度。

上述代碼拋出一次機會異常，但是如果任何一個塊被註釋掉，則不會發生。此外，該程序僅在第二次調用該例程時崩潰。

一直試圖弄清楚這一切，任何幫助將是太棒了。

調用它的代碼是：

int subrow = threadIdx.y + Mod_Height/2; 
int threadID = subrow * (Mod_Width+1) + threadIdx.x;   
int obsY = windowY + subrow; 
float3 ray = CalculateRay(obsX,obsY); 

if(!IntersectSphere(ModDistance, ModIntensity, ray, threadID, param)) 
{ 
    IntersectCone(ModDistance, ModIntensity, ray, threadID, param); 
} 

subrow = threadIdx.y; 
threadID = subrow * (Mod_Width+1) + threadIdx.x;   
obsY = windowY + subrow; 
ray = CalculateRay(obsX,obsY); 

if(!IntersectSphere(ModDistance, ModIntensity, ray, threadID, param)) 
{ 
    IntersectCone(ModDistance, ModIntensity, ray, threadID, param); 
}

來源

2012-03-01 3Pi

錯誤的症狀顯示在與錯誤來源不同的地方。整個內核太大，所以不能分配足夠的寄存器。 – 3Pi 2012-03-01 02:43:04

您呼出的兩條線可能提供了一個起點。（1）'Norm（）'函數是什麼樣的？我期望它返回一個標量，而不是一個向量？ 'Magnitude（）'同樣的問題。（2）如果'threadID'超出範圍，我只能看到這行崩潰。 – 2012-03-01 03:38:35

剛剛看到您的評論。很好的發現。 – 2012-03-01 03:39:24

的內核正在運行的資源。正如評論中發佈的那樣，它給出了錯誤CudaErrorLaunchOutOfResources。

爲了避免這種情況，您應該使用__launch_bounds__說明符來指定您需要的內核塊大小。這將迫使編譯器確保有足夠的資源。有關__launch_bounds__的詳細信息，請參閱CUDA編程指南。

來源

2012-03-06 04:30:16 harrism

第二次迭代崩潰 - 不相關的訂單

回答

相關問題