2017-10-13 128 views
-1

我最近在玩cuda/numba代碼。我有一個MxN矩陣(比如cumul_A),其中每行是一個累積概率分佈。我想從這些累積分佈中抽取一個樣本,通過映射一個均勻隨機分佈的樣本。簡單地說,可以說從均勻隨機分佈中抽取的樣本爲0.3。 cuda內核應該選擇一行'cumul_A'並將該行的每個元素(從該行的第一個元素開始)與0.3進行比較。一旦它的值大於0.3,內核應該將元素的索引存儲在輸出參數中並打破for循環。我無法讓這個看似簡單的內核工作。 break語句是否會在內核中造成麻煩? 下面提供了最小工作示例。cuda內核循環中的break語句給出了問題

from __future__ import division 
    from __future__ import print_function 

    import numpy as np 

    from numba import vectorize, cuda, jit 
    np.set_printoptions(precision=4, suppress=True) 

    # Number of rows 
    M = 10 
    # Number of columns 
    N = 20 

    # ======= 1-D GRIDS ======= 
    # Set the number of threads in a block 
    threadsperblock_1d = 4 
    # Calculate the number of thread blocks in the grid 
    blockspergrid_1d = np.int(np.ceil(M/threadsperblock_1d)) 
    # ======= 1-D GRIDS ======= 

    @cuda.jit('void(float32[:, :], float32[:], int32[:])') 
    def get_randomchoice(cumul_a, random_nos, output): 
     x = cuda.grid(1) 
     if x < cumul_a.shape[0]: 
     for y in range(cumul_a.shape[1]): 
      if random_nos[x] > cumul_a[x, y]: 
      output[x] = y 
      break # return 

    if __name__ == '__main__': 
     # Prepare the matrix whise each row is a cumulative probability distribution 
     A = np.random.rand(M, N).astype(np.float32) 
     A = np.divide(A,np.sum(A,axis=1,keepdims=True)) 
     cumul_A = np.cumsum(A, axis=1) 

     # Put an assertion that cumul_A is indeed cumulative 
     assert np.allclose(cumul_A[:,-1],np.ones(M)) 

     # Draw values from uniform distribution 
     RandValues = np.random.rand(M).astype(np.float32) 

     # Output array in numpy 
     Y = np.zeros(M, dtype=np.int32) 
     for iStep in range(M): 
     Y[iStep] = np.argwhere(RandValues[iStep] <= cumul_A[iStep])[0] 

     print('From numpy:\n{}'.format(Y)) 

     # Transfer to GPU 
     cumul_A_gpu = cuda.to_device(cumul_A) 
     RandValues_gpu = cuda.to_device(RandValues) 
     # Return array from GPU 
     random_idx_gpu = cuda.device_array(M, dtype=np.int32) 
     get_randomchoice[blockspergrid_1d, threadsperblock_1d](cumul_A_gpu, RandValues_gpu, random_idx_gpu) 
     random_idx = random_idx_gpu.copy_to_host() 

     print('From cuda:\n{}'.format(random_idx)) 

任何幫助將不勝感激。

回答

-2

它的虛驚!代碼中有一個小故障。行if random_nos[x] > cumul_a[x, y]:將是if random_nos[x] < cumul_a[x, y]:。循環內沒有'break'問題。