2015-09-07 57 views
0

有沒有人通過最新的mpy4py(和pyCUDA 2015.1.3)發送過MPI的CUDA數組? 要發送數組,必須將相應的數據類型轉換爲連續緩衝區。如何使用pyCUDA通過MPI進行廣播?

to_buffer = lambda arr: None if arr is None else lambda arr: arr.gpudata.as_buffer(arr.nbytes 

完整的腳本如下所示::這種轉換是使用下面的拉姆達做

import numpy 
    from mpi4py import MPI 

    import pycuda.gpuarray as gpuarray 
    import pycuda.driver as cuda 
    import pycuda.autoinit 
    import numpy 

    to_buffer = lambda arr: None if arr is None else lambda arr: arr.gpudata.as_buffer(arr.nbytes) 

    print "pyCUDA version " + str(pycuda.VERSION) 
    a_gpu = gpuarray.to_gpu(numpy.random.randn(4,4).astype(numpy.float32)) 

    comm = MPI.COMM_WORLD 
    rank = comm.Get_rank() 

    comm.Bcast([ to_buffer(agpu , MPI.FLOAT], root=0) 

但不幸的是,這一切的美麗崩潰與這些錯誤:

pyCUDA version (2015, 1, 3) 
    Traceback (most recent call last): 
    File "./test_mpi.py", line 21, in <module> 
    comm.Bcast([ to_buffer(numpy.random.randn(4,4).astype(numpy.float32)) , MPI.FLOAT], root=0) 
    File "Comm.pyx", line 405, in mpi4py.MPI.Comm.Bcast (src/mpi4py.MPI.c:66743) 
    File "message.pxi", line 388, in mpi4py.MPI._p_msg_cco.for_bcast (src/mpi4py.MPI.c:23220) 
    File "message.pxi", line 355, in mpi4py.MPI._p_msg_cco.for_cco_send (src/mpi4py.MPI.c:22959) 
    File "message.pxi", line 111, in mpi4py.MPI.message_simple (src/mpi4py.MPI.c:20516) 
    File "message.pxi", line 51, in mpi4py.MPI.message_basic (src/mpi4py.MPI.c:19644) 
    File "asbuffer.pxi", line 108, in mpi4py.MPI.getbuffer (src/mpi4py.MPI.c:6757) 
    File "asbuffer.pxi", line 50, in mpi4py.MPI.PyObject_GetBufferEx (src/mpi4py.MPI.c:6093) 
    TypeError: expected a readable buffer object 

任何想法是什麼繼續? 也許有人有替代緩衝區轉換口頭禪?

在此先感謝!

+1

plain mpi需要在* host *內存中支持緩衝區協議的對象。 'DeviceAllocation'位於設備內存中。我不認爲這可能永遠工作 – talonmies

+0

@talonmies,也許你是對的,但我認爲,a_gpu確實位於GPU,但to_buffer會將其複製到主機上。如果這是錯誤的,請詳細解釋。謝謝! –

+0

您可以閱讀'as_buffer'文檔[here](http://documen.tician.de/pycuda/driver.html#pycuda.driver.DeviceAllocation)及其來源[here](https://github.com/誘導劑/ pycuda /斑點/ fde69b0502d944a2d41e1f1b2d0b78352815d487/SRC/CPP/cuda.hpp#L1547)。我沒有看到通過從DeviceAllocation創建緩衝區對象來啓動要託管設備的設備的任何位置。你做? – talonmies

回答

1

所有這一切需要是調用MPI廣播與一個有效的主機存儲器緩衝區對象或numpy的陣列,例如:

comm.Bcast(a_gpu.get(), root=0) 

代替拉姆達的用於將DeviceAllocation對象到緩衝器對象