在內核函數中使用cuPrint打印字符串向量的元素

我試圖打印使用cuPrint函數作爲內核函數參數傳遞的字符串向量的元素。在內核函數中使用cuPrint打印字符串向量的元素

內核

__global__ void testKernel(string wordList[10000]) 
{ 
    //access thread id 
    const unsigned int bid = blockIdx.x; 
    const unsigned int tid = threadIdx.x; 
    const unsigned int index = bid * blockDim.x + tid; 


    cuPrintf("wordList[%d]: %s \n", index, wordList[index]); 
}

從主要功能設置執行參數代碼的代碼和啓動內核

//Allocate device memory for word list 
    string* d_wordList; 
    cudaMalloc((void**)&d_wordList, sizeof(string)*number_of_words); 

    //Copy word list from host to device 
    cudaMemcpy(d_wordList, wordList, sizeof(string)*number_of_words, cudaMemcpyHostToDevice); 

    //Setup execution parameters 
    int n_blocks = (number_of_words + 255)/256; 
    int threads_per_block = 256; 

    dim3 grid(n_blocks, 1, 1); 
    dim3 threads(threads_per_block, 1, 1); 

    cudaPrintfInit(); 
    testKernel<<<grid, threads>>>(d_wordList); 
    cudaDeviceSynchronize(); 
    cudaPrintfDisplay(stdout,true); 
    cudaPrintfEnd();

我收到錯誤：「錯誤44錯誤：調用主機函數（「std :: basic_string，std :: allocator> ::〜basic_string」）從全球函數（「testKernel」）不被允許D：... \ kernel.cu 44 1 CUDA_BF_lar ge_word_list 「

我錯過了什麼？在此先感謝。

來源

2014-09-22 Alex Iacob

通常，您不能在CUDA設備代碼中使用C++庫中的函數（包括<string>）。使用數組char來代替您的字符串。

Here是將「字符串」操作爲以空字符結尾的C風格數組並將它們傳遞給內核的示例。

來源

2014-09-22 12:59:52

我正在從這樣的文本文件中讀取文字 \t //構建包含來自文本文件的文字的字符串數組 \t string wordList [10000]; \t if（file。IS_OPEN（）） \t { \t \t \t 爲\t（INT I = 0; I >單詞一覽[I]; \t \t \t // cout << endl << wordList [i] << endl; \t \t} \t \t \t} 會有什麼用字符數組的變化？ – 2014-09-22 13:30:43

在我的答案中提供了示例代碼的鏈接，其中顯示瞭如何操作C風格的字符串。我假設你可以處理文件I/O。這不是CUDA特有的。 – 2014-09-22 15:20:56

是的，處理文件I/O沒有問題。謝謝！ – 2014-09-23 06:50:21

我修改了代碼，並使用了一串字符串的字符串。

內核的更新版本：

__global__ void testKernel(char* d_wordList) 
{ 
    //access thread id 
    const unsigned int bid = blockIdx.x; 
    const unsigned int tid = threadIdx.x; 
    const unsigned int index = bid * blockDim.x + tid; 


    //cuPrintf("Hello World from kernel! \n"); 


      cuPrintf("!! %c%c%c%c%c%c%c%c%c%c \n" , d_wordList[index * 20 + 0], 
                d_wordList[index * 20 + 1], 
                d_wordList[index * 20 + 2], 
                d_wordList[index * 20 + 3], 
                d_wordList[index * 20 + 4], 
                d_wordList[index * 20 + 5], 
                d_wordList[index * 20 + 6], 
                d_wordList[index * 20 + 7], 
                d_wordList[index * 20 + 8], 
                d_wordList[index * 20 + 9]); 


}

我也想知道是否有從字符數組打印的話更簡單的方法。（低音我需要打印，以後每個內核函數使用一個單詞）。

從主功能的代碼是：

  const int text_length = 20; 

     char (*wordList)[text_length] = new char[10000][text_length]; 
     char *dev_wordList; 

     for(int i=0; i<number_of_words; i++) 
     { 
      file>>wordList[i]; 
      cout<<wordList[i]<<endl; 
     } 

     cudaMalloc((void**)&dev_wordList, 20*number_of_words*sizeof(char)); 
     cudaMemcpy(dev_wordList, &(wordList[0][0]), 20 * number_of_words * sizeof(char), cudaMemcpyHostToDevice); 

     char (*resultWordList)[text_length] = new char[10000][text_length]; 

     cudaMemcpy(resultWordList, dev_wordList, 20 * number_of_words * sizeof(char), cudaMemcpyDeviceToHost); 

     for(int i=0; i<number_of_words; i++) 
      cout<<resultWordList[i]<<endl; 

     //Setup execution parameters 
     int n_blocks = (number_of_words + 255)/256; 
     int threads_per_block = 256; 


     dim3 grid(n_blocks, 1, 1); 
     dim3 threads(threads_per_block, 1, 1); 

cudaPrintfInit(); 
     testKernel<<<grid, threads>>>(dev_wordList); 
     cudaDeviceSynchronize(); 
     cudaPrintfDisplay(stdout,true); 
     cudaPrintfEnd();

如果使用更小的值這樣的塊/線程的數目：

dim3 grid(20, 1, 1); 
dim3 threads(100, 1, 1);

內核發射是正確的，它顯示一個字每個線程。但我需要這個過程10000字。我錯過了什麼？

來源

2014-09-23 12:07:06

發佈自己的問題的答案，並用它來問一個新的問題可能不是一個好主意。這不是真的如何運作。如果您有新問題，建議您提出一個新問題。請注意，對於我來說，你最後的問題還不清楚。什麼不是專門工作的？你知道像每個塊的限制線程嗎？您是否意識到內核中的printf在可產生的輸出量方面有限？什麼實際上不工作？（發佈一個新問題） – 2014-09-23 21:04:55

好的，謝謝你的建議。我知道每個數據塊限制的線程數，在我的情況下，每個數據塊的線程數是512.問題是，對於內核不輸出的更大的網格/線程數參數，但問題可能是cuPritf函數的限制。 – 2014-09-24 07:06:19

我調查了這個問題，原因是cuPrintf僅限於多達2048個線程的網格。 – 2014-09-24 07:53:47

在內核函數中使用cuPrint打印字符串向量的元素

回答

相關問題