初學CUDA - 簡單的無效增量不起作用

我正在與CUDA一起開發項目。爲了掌握它，我有以下代碼。初學CUDA - 簡單的無效增量不起作用

#include <iostream> 

using namespace std; 

__global__ void inc(int *foo) { 
    ++(*foo); 
} 

int main() { 
    int count = 0, *cuda_count; 
    cudaMalloc((void**)&cuda_count, sizeof(int)); 
    cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice); 
    cout << "count: " << count << '\n'; 
    inc <<< 100, 25 >>> (&count); 
    cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost); 
    cudaFree(cuda_count); 
    cout << "count: " << count << '\n'; 
    return 0; 
}

輸出是

count: 0 
count: 0

什麼問題？

提前致謝！

來源

2010-12-10 Renato Rodrigues

你應該通過一些在節目指南的例子也許可以工作。您的語法與編程指南中建議的內容不一致。 – Marm0t 2010-12-10 18:10:24

我找到了解決辦法。我只需要使用一個原子函數，即一個不受其他線程干擾的函數。換句話說，在操作完成後，沒有其他線程可以訪問特定地址。

代碼：

#include <iostream> 

using namespace std; 

__global__ void inc(int *foo) { 
    atomicAdd(foo, 1); 
} 

int main() { 
    int count = 0, *cuda_count; 
    cudaMalloc((void**)&cuda_count, sizeof(int)); 
    cudaMemcpy(cuda_count, &count, sizeof(int), cudaMemcpyHostToDevice); 
    cout << "count: " << count << '\n'; 
    inc <<< 100, 25 >>> (cuda_count); 
    cudaMemcpy(&count, cuda_count, sizeof(int), cudaMemcpyDeviceToHost); 
    cudaFree(cuda_count); 
    cout << "count: " << count << '\n'; 
    return 0; 
}

輸出：

count: 0 
count: 2500

謝謝你讓我知道我犯下了錯誤。

來源

2010-12-10 21:24:45

您應該將cuda_count傳遞給您的內核函數。除此之外，你所有的線程都試圖增加相同的內存位置。這種影響沒有明確定義（至少有一次寫入會成功，但不止一次）。

您需要防止由只讓一個線程執行的工作：

__global__ void inc(int *foo) { 
    if (blockIdx.x == 0 && threadIdx.x == 0) 
    ++*foo; 
}

（未經測試）

來源

2010-12-10 12:35:46

什麼是我的失敗。然而，它的輸出仍然是錯誤的。它給了我1而不是預期的2500. – 2010-12-10 12:39:40

@Renato：這不是CUDA的工作原理。看到我更新的答案：它只是未定義從不同的線程寫入相同的內存位置。你想要的是一個所謂的收集操作。實現這一點並不重要。 – 2010-12-10 12:41:23

我試過你的快速修復，但輸出爲2. – 2010-12-10 12:48:37

你的代碼的問題是你傳遞給設備內核指針指向count。沒有指針可以計數。一個「&」太多

此行

inc <<< 100, 25 >>> (&count);

應該

inc <<< 100, 25 >>> (count);

來源

2012-09-29 07:30:16

初學CUDA - 簡單的無效增量不起作用

回答

相關問題