CUDA太多的資源我跑我的代碼在GTX 480與計算能力的一些問題2.0請求啓動
我總是得到以下錯誤,如果我進入內核,每塊1024個主題:
========= CUDA-MEMCHECK
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to "too many resources requested for launch" on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2ef613]
========= Host Frame:/usr/local/cuda-6.5/lib64/libcudart.so.6.5 (cudaLaunch + 0x17e) [0x3686e]
========= Host Frame:./bin/myProgram [0x3a50]
========= Host Frame:./bin/myProgram [0x388a]
========= Host Frame:./bin/myProgram [0x38e3]
========= Host Frame:./bin/myProgram [0x2a99]
========= Host Frame:./bin/myProgram [0x1410]
========= Host Frame:./bin/myProgram [0x1da0]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xed) [0x2176d]
========= Host Frame:./bin/myProgram [0x1139]
=========
我運行程序多時間,不同的塊和線程數:
5 Blocks, 512 Threads per Block => Works
5 Blocks, 1024 Threads per Block => Error
10 Blocks, 512 Threads per Block => Works
10 Blocks, 1024 Threads per Block => Error
15 Blocks, 512 Threads per Block => Works
15 Blocks, 1024 Threads per Block => Error
我檢查了使用的寄存器,它似乎是確定。帶有28個寄存器的「Function4」是使用這麼多線程的內核。所有其他內核只使用< < < 1,328次/每次調用。
ptxas info : 0 bytes gmem
ptxas info : Function properties for _Z7function1Py
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Compiling entry function '_Z13function2PyS_i' for 'sm_20'
ptxas info : Function properties for _Z13function2PyS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 22 registers, 52 bytes cmem[0]
ptxas info : Compiling entry function '_Z6function3PyiS_' for 'sm_20'
ptxas info : Function properties for _Z6function3PyiS_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 22 registers, 56 bytes cmem[0]
ptxas info : Compiling entry function '_Z17function4PyiiS_Phji' for 'sm_20'
ptxas info : Function properties for _Z17function4PyiiS_Phji
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 72 bytes cmem[0]
我用CC 3.0運行這個程序,我的GTX 660也有CC 3.0,它的工作方式是每塊1024線程。我不知道問題來自何處。有沒有人有想法?
它可能是一個註冊每線程問題。嘗試用'-maxrregcount 28'(或'24')編譯代碼,看看它是否會影響失敗的案例。 – 2014-12-01 09:51:34