我正在寫用於使用流GPU的矩陣加法程序和明顯被釘扎memory.So我分配在固定存儲器3個矩陣,但是特定尺寸後它顯示API錯誤2:出memory.My RAM是4GB但我無法使用超過800MB。有沒有什麼辦法可以控制這個上限? 我SYS配置: 的NVIDIA GeForce 9800GTX Intel酷睿2四核 對於流執行代碼如下固定內存
(int i=0;i<no_of_streams;i++)
{
cudaMemcpyAsync(device_a+i*(n/no_of_streams),hAligned_on_host_a+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyHostToDevice,streams[i]);
cudaMemcpyAsync(device_b+i*(n/no_of_streams),hAligned_on_host_b+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyHostToDevice,streams[i]);
cudaMemcpyAsync(device_c+i*(n/no_of_streams),hAligned_on_host_c+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyHostToDevice,streams[i]);
matrixAddition<<<blocks,threads,0,streams[i]>>>(device_a+i*(n/no_of_streams),device_b+i*(n/no_of_streams),device_c+i*(n/no_of_streams));
cudaMemcpyAsync(hAligned_on_host_a+i*(n/no_of_streams),device_a+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyDeviceToHost,streams[i]);
cudaMemcpyAsync(hAligned_on_host_b+i*(n/no_of_streamss),device_b+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyDeviceToHost,streams[i]);
cudaMemcpyAsync(hAligned_on_host_c+i*(n/no_of_streams),device_c+i*(n/no_of_streams),nbytes/no_of_streams,cudaMemcpyDeviceToHost,streams[i]));
}
可能是一堆原因,從碎片內存到糟糕的代碼。很高興看到你在做什麼來實際提出有用的建議。 – 2012-03-23 16:58:51
代碼流程如下對每個流 cudamemcpy創建 '2流(陣列的半部分,hostToDevice) 內核啓動 cudaMemcpy(陣列的halfportion,DeviceToHost) 沒什麼特別的程序工作正常,性能差,我只是想讓更多固定內存發生,因爲GPU全局內存大約是1GB? – 2012-03-23 17:01:45
通過編輯問題放置任何代碼。 – 2012-03-23 17:13:18