CUDA：一個struct

內部結構的數組的分配我已經這些結構：CUDA：一個struct

typedef struct neuron 
{ 
float* weights; 
int n_weights; 
}Neuron; 


typedef struct neurallayer 
{ 
Neuron *neurons; 
int n_neurons; 
int act_function; 
}NLayer;

「n圖層」結構可以包含「神經元」的任意數量的

我試圖分配5「神經元」一「n圖層」結構從以這種方式主機：

NLayer* nL; 
int i; 
int tmp=9; 
cudaMalloc((void**)&nL,sizeof(NLayer)); 
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron)); 
for(i=0;i<5;i++) 
    cudaMemcpy(&nL->neurons[i].n_weights,&tmp,sizeof(int),cudaMemcpyHostToDevice);

...然後我試圖修改「NL->神經元[0] .n_weights」變量與內核：

__global__ void test(NLayer* n) 
      { 
       n->neurons[0].n_weights=121; 
      }

，但在編譯時NVCC返回該「警告」與內核無關的唯一行：

Warning: Cannot tell what pointer points to, assuming global memory space

當內核完成其工作的結構開始無法訪問。

這很可能是我在分配過程中做錯了什麼事......有人可以幫助我嗎？非常感謝，對不起我的英語！ :)

UPDATE：

感謝奧蘭我修改我的代碼創建這個函數應該分配結構「n圖層」的一個實例：

NLayer* setNLayer(int numNeurons,int weightsPerNeuron,int act_fun) 
{ 
    int i; 
    NLayer h_layer; 
    NLayer* d_layer; 
    float* d_weights; 

    //SET THE LAYER VARIABLE OF THE HOST NLAYER 
    h_layer.act_function=act_fun; 
    h_layer.n_neurons=numNeurons; 
    //ALLOCATING THE DEVICE NLAYER 
    if(cudaMalloc((void**)&d_layer,sizeof(NLayer))!=cudaSuccess) 
     puts("ERROR: Unable to allocate the Layer"); 
    //ALLOCATING THE NEURONS ON THE DEVICE 
    if(cudaMalloc((void**)&h_layer.neurons,numNeurons*sizeof(Neuron))!=cudaSuccess) 
     puts("ERROR: Unable to allocate the Neurons of the Layer"); 
    //COPING THE HOST NLAYER ON THE DEVICE 
    if(cudaMemcpy(d_layer,&h_layer,sizeof(NLayer),cudaMemcpyHostToDevice)!=cudaSuccess) 
       puts("ERROR: Unable to copy the data layer onto the device"); 

    for(i=0;i<numNeurons;i++) 
    { 
     //ALLOCATING THE WEIGHTS' ARRAY ON THE DEVICE 
     cudaMalloc((void**)&d_weights,weightsPerNeuron*sizeof(float)); 
     //COPING ITS POINTER AS PART OF THE i-TH NEURONS STRUCT 
     if(cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice)!=cudaSuccess) 
       puts("Error: unable to copy weights' pointer to the device"); 
    } 


    //RETURN THE DEVICE POINTER 
    return d_layer; 
}

，我調用該函數（內核「測試」是以前聲明的）：

int main() 
{ 
    NLayer* nL; 
    int h_tmp1; 
    float h_tmp2; 

    nL=setNLayer(10,12,13); 
    test<<<1,1>>>(nL); 
    if(cudaMemcpy(&h_tmp1,&nL->neurons[0].n_weights,sizeof(float),cudaMemcpyDeviceToHost)!=cudaSuccess); 
     puts("ERROR!!"); 
    printf("RESULT:%d",h_tmp1); 

}

當我編譯該代碼編譯器顯示我的警告，當我執行該程序時，它在屏幕上打印：

Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
Error: unable to copy weights' pointer to the device 
ERROR!! 
RESULT:1

如果我評論內核調用，最後一個錯誤不會比較。

我在哪裏錯了？我不知道該怎麼辦感謝您的幫助！

來源

2012-08-08 Andrea Sylar Solla

這一切都取決於您使用的GPU卡。費米卡使用共享和全局存儲空間的統一尋址，而費米卡之前沒有。

對於費米前的情況，你不知道地址是共享的還是全局的。編譯器通常可以解決這個問題，但有些情況下它不能。當需要指向共享內存的指針時，通常需要一個共享變量的地址，編譯器可以識別這個地址。如果沒有明確定義，則會顯示「假定全局」消息。

如果您使用的是具有2.x或更高的計算capabiilty一個GPU，它應與-arch = sm_20編譯器標誌

來源

2012-08-09 01:13:28

雖然你對這個警告是正確的，但我懷疑是什麼導致了程序的異常行爲。畢竟，編譯器關於位於全局Emory空間中的結構的假設是正確的... – aland 2012-08-09 01:52:52

我使用的是具有1.2功能的NVIDIA GeForce 320M 256 MB，所以我不認爲它是「費米」卡 – 2012-08-09 09:29:32

問題的工作是在這裏：

cudaMalloc((void**)&nL,sizeof(NLayer)); 
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));

在第一行，nL指向設備上全局內存中的結構。因此，在第二行中，cudaMalloc的第一個參數是駐留在GPU上的地址，這是未定義的行爲（在我的測試系統中，它會導致段錯誤;但在您的情況下，有更細微的變化）。

做你想做什麼正確的方法是先在主內存中創建結構，用數據填充它，然後將其複製到設備，如：

NLayer* nL; 
NLayer h_nL; 
int i; 
int tmp=9; 
// Allocate data on device 
cudaMalloc((void**)&nL, sizeof(NLayer)); 
cudaMalloc((void**)&h_nL.neurons, 6*sizeof(Neuron)); 
// Copy nlayer with pointers to device 
cudaMemcpy(nL, &h_nL, sizeof(NLayer), cudaMemcpyHostToDevice);

另外，不要忘了始終檢查CUDA例程中的任何錯誤。

UPDATE

在第二個版本的代碼：

cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,...) ---再次，你在主機設備解引用指針（d_layer）。相反，你應該使用

cudaMemcpy(&h_layer.neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice

在這裏，你拿h_layer（主機結構），讀取其元素（h_layer.neurons），這是指向設備內存。然後你做一些指針算法（&h_layer.neurons[i].weights）。不需要訪問設備內存來計算該地址。

來源

2012-08-09 01:23:59 aland

我已經修改了我的代碼，但它不起作用，你可以看看嗎？新的代碼在我的第一篇文章...謝謝！ – 2012-08-09 12:08:03

哦！謝謝你的作品！我只有一個問題：如果我想從主機訪問到數據競爭到整數變量** d_layer->神經元[0] .n_weights **我必須先在主機上覆制** d_layer **，那麼我必須在主機上覆制** d_layer-> neurons [0] **，最後，我可以使用「d_layer-> neurons [0] .n_weights變量??我只是因爲它的問題我試圖用cudaMemcpy（...）直接複製「d_layer-> neurons [0] .n_weights」，但它總是返回「無效參數」錯誤。 – 2012-08-09 15:08:52

@AndreaSylarSolla你可以簡單地使用'int t; cudaMemcpy（＆t，＆h_layer.neurons [0] .n_weights，....）'或'Neuron t; cudaMemcpy（＆t，＆h_layer.neurons [0]，....）'。沒有必要複製'd_layer'，因爲你只需要'神經元'指針的值，但是'h_layer'中的值相同。 – aland 2012-08-09 15:24:50

CUDA：一個struct

回答

相關問題