2017-07-14 220 views
0

我有一個內核源代碼,可以在我的PC上的G970上運行,但不會在我的2015年初MacBook Pro上使用Iris 6100 1536MB圖形進行編譯。pyopenCL,openCL,無法在GPU上構建程序

platform = cl.get_platforms()[0] 
device = platform.get_devices()[1] # Get the GPU ID 
ctx  = cl.Context([device])  # Tell CL to use GPU 
queue = cl.CommandQueue(ctx)  # Create a command queue for the target device. 
# program = cl.Program(ctx, kernelsource).build() 
print platform.get_devices() 

這對 '蘋果' 在爲0xffffffff> get_devices()顯示我的 '英特爾(R)酷睿(TM)i5-5287U CPU @ 2.90GHz',「英特爾(R)光圈(TM)顯卡6100 '在''蘋果'在0x1024500。

內核將在CPU上正確運行。但是當我在GPU上構建程序時。它返回:

--------------------------------------------------------------------------- 
RuntimeError        Traceback (most recent call last) 
<ipython-input-44-e2b6e1b931de> in <module>() 
     3 ctx  = cl.Context([device])  # Tell CL to use GPU 
     4 queue = cl.CommandQueue(ctx)  # Create a command queue for the target device. 
----> 5 program = cl.Program(ctx, kernelsource).build() 
     6 
     7 

/usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in build(self, options, devices, cache_dir) 
    393       self._context, self._source, options, devices, 
    394       cache_dir=cache_dir), 
--> 395      options=options, source=self._source) 
    396 
    397    del self._context 

/usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in _build_and_catch_errors(self, build_func, options, source) 
    428   # Python 3.2 outputs the whole list of currently active exceptions 
    429   # This serves to remove one (redundant) level from that nesting. 
--> 430   raise err 
    431 
    432  # }}} 

RuntimeError: clbuildprogram failed: BUILD_PROGRAM_FAILURE - 

Build on <pyopencl.Device 'Intel(R) Iris(TM) Graphics 6100' on 'Apple' at 0x1024500>: 

Cannot select: 0x7f94b30a5110: i64,ch = dynamic_stackalloc 0x7f94b152a290, 0x7f94b30a4f10, 0x7f94b3092c10 [ORD=7] [ID=54] 
    0x7f94b30a4f10: i64 = and 0x7f94b30a4c10, 0x7f94b3092b10 [ORD=7] [ID=52] 
    0x7f94b30a4c10: i64 = add 0x7f94b30a6610, 0x7f94b3092a10 [ORD=7] [ID=49] 
     0x7f94b30a6610: i64 = shl 0x7f94b3092d10, 0x7f94b3092e10 [ID=46] 
     0x7f94b3092d10: i64 = bitcast 0x7f94b30a4810 [ID=41] 
      0x7f94b30a4810: v2i32 = IGILISD::MOVSWZ 0x7f94b3092710, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=32] 
      0x7f94b3092710: i32,ch = CopyFromReg 0x7f94b152a290, 0x7f94b3092610 [ORD=5] [ID=22] 
       0x7f94b3092610: i32 = Register %vreg60 [ORD=5] [ID=1] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b3092e10: i64 = bitcast 0x7f94b30a3f10 [ID=38] 
      0x7f94b30a3f10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4510, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=29] 
      0x7f94b30a4510: i32 = Constant<2> [ID=19] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b3092a10: i64 = bitcast 0x7f94b30a4b10 [ID=40] 
     0x7f94b30a4b10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4e10, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=31] 
      0x7f94b30a4e10: i32 = Constant<7> [ID=21] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
      0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
    0x7f94b3092b10: i64 = bitcast 0x7f94b3092910 [ID=39] 
     0x7f94b3092910: v2i32 = IGILISD::MOVSWZ 0x7f94b30a5010, 0x7f94b30a4210, 0x7f94b30a2810, 0x7f94b30a2810 [ID=30] 
     0x7f94b30a5010: i32 = Constant<-8> [ID=20] 
     0x7f94b30a4210: i32 = Constant<-1> [ORD=3] [ID=10] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
    0x7f94b3092c10: i64 = bitcast 0x7f94b3092810 [ID=35] 
    0x7f94b3092810: v2i32 = IGILISD::MOVSWZ 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=27] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
     0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7] 
In function: trajectories 
(options: -I /usr/local/lib/python2.7/site-packages/pyopencl-2015.2.4-py2.7-macosx-10.11-x86_64.egg/pyopencl/cl) 
(source saved as /var/folders/p2/jd7m10gs5k1_q6hx5kvktkcc0000gn/T/tmpWQmCKr.cl) 

任何建議爲什麼這不會運行? 我正在運行2015年初MacBook Pro,Sierra 10.12.5。 打印cl.version.VERSION回到2015年2月4日

以下是內核代碼:

kernelsource = """ 
__kernel void trajectories(
    // TODO: adjust argtypes above if this is changed 
    const int N, 
    const int dim, 
    __constant float* data, 
    const int nrParticles, 
    __global float* pos, 
    __global float* vel, 
    const int nrSteps, 
    __global float* trj, 
    __global float* sigarr, 
    const float sigma, 
    const float mass, 
    const float alpha, // alpha is resistance in reverse. 
    const float dt 
){ 
    int i,k,step; 
    float h, sigsum, hexp; 
    int pidx = get_global_id(0); // global ID used as particle index 
    int ofs = pidx * nrSteps * dim; 
    int accofs = ofs + (nrSteps-1) * dim; // use last trj point to tmp store acc vector 
    float v[dim]; 
    float sigma2 = sigma*sigma; 
    float m = mass/sigma2; 
    float dt_over_m = dt /m; 
    for(step=0; step<nrSteps; step++){ 
     for(k=0; k<dim; k++) 
     { 
      trj[accofs+k]=0; 
     } 
     for(i=0; i<N; i++) 
     { 

      h=0; // to store ||data[i]-x||**2 
      for(k=0; k<dim; k++) 
      { 
       v[k] = pos[pidx*dim+k] - data[i*dim + k]; 
       h += v[k]*v[k];  //h == force1p_sum 
      }; 
      hexp = exp(-h/sigma2)/sigma2; 

      for(k=0; k<dim; k++) 
      { 
       trj[accofs+k] += -(hexp) * v[k]; 
      };   
     }; 
     sigsum = 0; 
     for(k=0; k<dim; k++) 
     { 
      vel[pidx*dim+k]  = alpha * vel[pidx*dim+k] + dt_over_m * trj[accofs+k];  // vel = alpha*vel + acc*dt 
      pos[pidx*dim+k] += dt * vel[pidx*dim+k];      // pos = pos + vel*dt 
      sigsum    += vel[pidx*dim+k] * vel[pidx*dim+k]; // v^2 for kinetic energy 
      trj[ofs+step*dim+k] = pos[pidx*dim+k];    // write to result vector 

     }; 
     sigarr[pidx*nrSteps+step] = sigsum;     // sig = | vel | 
    } 
    for(step=0; step<nrSteps-2; step++) 
    { 
     sigarr[pidx*nrSteps+step] = sigarr[pidx*nrSteps+step+2] - sigarr[pidx*nrSteps+step+1]; 
    }; 
    sigarr[pidx*nrSteps+nrSteps-1] = sigarr[pidx*nrSteps+nrSteps-2] = 0; 

} 
""" 

感謝

嘉俊

+0

你能分享內核代碼嗎?它返回BUILD_PROGRAM_FAILURE,所以內核代碼一定有問題。 –

+0

'clBuildProgram'也應該給你診斷輸出並告訴你問題出在哪裏。如果您無法理解這一點,請將其與源代碼一起張貼@parallelhighway建議,我們可以嘗試提供幫助。 – pmdj

+0

嗨,我添加了內核代碼。謝謝 –

回答

1

你應該嘗試查詢生成的誤差在這樣的案例。在類似的內核代碼錯誤中你可以做的另一件事是你可以使用脫機編譯器。每個OpenCL實施者都有離線編譯器。

你可以在這裏找到英特爾的OpenCL編譯器離線:https://software.intel.com/en-us/articles/programming-with-the-intel-sdk-for-opencl-applications-development-tools

AMD有一個叫做CodeXL工具,在其中你也可以做離線編輯,看看你的內核代碼編譯。

這裏是ARM的OpenCL編譯器離線:https://developer.arm.com/products/software-development-tools/graphics-development-tools/mali-offline-compiler/downloads

英特爾的支持是最多的OpenCL 2.1,而ARM直到1.1支持。所以,你可以選擇其中的任何一個來編譯你的內核代碼,以便輕鬆找出錯誤或錯誤。

在你的核心的問題是以下行:

float v[dim]; 

的OpenCL C規範不允許變長數組和離線編譯器提供了以下錯誤:

ERROR: <source>:22:12: error: variable length arrays are not supported in OpenCL 

您可以修復爲了克服這個錯誤,從現在開始,你可以檢查你的內核是否可以用離線編譯器編譯。

編輯:在說明書中,有一個腳註解釋了變長數組不支持。你可以在這裏看到它:

https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page=31

+0

嗨,你是對的。當我用固定長度替換它時,它可以工作。但我不太清楚的是,我之前一直在使用CPU和Nvidia 970 GPU的可變長度。所有這些工作,但不是英特爾Iris GPU。任何想法爲什麼會發生?暗淡是我的數據的維度,除非我每次都手動更改它,否則需要將其作爲一個變量,是否有任何復飛?非常感謝 –

+0

您可以在CPU上創建v值並將其作爲參數傳遞。在這種情況下,不允許在內核中定義可變長度。 –