使用LLVM/Clang在Win10上使用OpenMP的Cuda

我一直在試圖獲得一個在Win10上使用Cuda，OpenMP和LLVM/Clang的簡單應用程序。從我在網上找到的各種文檔和功能點演示文檔中，我相信這種功能是以某種方式得到支持的，但我不確定Win10是否支持該功能，以及它是否在主要版本中。我正在使用LLVM 4.0.0rc1。我以各種方式從頭部成功構建之後下載了二進制文件。使用LLVM/Clang在Win10上使用OpenMP的Cuda

我修改了這個code，看起來像下面這樣。我也嘗試過不同的OMP和C變體。它編譯好。你可以從verbose output看到，這個構建似乎在製作一個胖二進制文件。有趣的是，它似乎並不關心我爲某個目標（或者如果我給它一個目標）提供了什麼，或者特別是在omptargets中。它也會執行由nvprof報告的cuda函數。

當我運行此操作時，根據Open Hardware Monitor，我的所有四個處理器都達到100％的使用率，但GPU上沒有任何事情發生，除了可能對分析命令稍微使用內存。我錯過了什麼，或者這只是不起作用？

cudaError_t f; 
int t = 999; 
cudaProfilerStart(); 

printf("Enter\n"); 
#pragma omp target data map(tofrom: x[0:n],y[0:n]) map(tofrom: t,f) 
    { 
     f = cudaGetDevice(&t); 
#pragma omp target teams num_teams(10) thread_limit(192) 
#pragma omp parallel for 
     for (int i = 0; i < n; i++) { 
      for(int j = 0; j < 10000; j++) { 
       y[i] += a * x[i]; 
       y[i] *= 2; 
       y[i] -= x[i]/4; 
       y[i] *= .99; 
      } 
     } 
    } 
cudaProfilerStop();

輸出上nvprof：

==1844== NVPROF is profiling process 1844, command: example.exe 1000000 
Enter 
min = inf, max = inf, avg = 0.000000 0 0 
==1844== Profiling application: example.exe 1000000 
==1844== Profiling result: 
No kernels were profiled. 

==1844== API calls: 
Time(%)  Time  Calls  Avg  Min  Max Name 
98.86% 135.83ms   1 135.83ms 135.83ms 135.83ms cudaProfilerStart 
    0.60% 819.35us  91 9.0030us  0ns 398.73us cuDeviceGetAttribute 
    0.53% 726.09us   1 726.09us 726.09us 726.09us cuDeviceGetName 
    0.00% 5.2860us   1 5.2860us 5.2860us 5.2860us cuDeviceTotalMem 
    0.00% 4.5310us   1 4.5310us 4.5310us 4.5310us cudaGetDevice 
    0.00% 2.6430us   3  881ns  0ns 2.2650us cuDeviceGetCount 
    0.00% 1.5090us   3  503ns  377ns  755ns cuDeviceGet

來源

2017-02-09 Todd

我在IBM的工程師交換電子郵件。上游fork of LLVM/Clang仍在進行中。還有對x86的支持，但它是否適用於Windows是未知的。

如果您在我的parallel-computing.pro鏈接中注意到，有一個支持OpenMP和Cuda的舊叉。我不確定這些項目之間的關係是什麼，如果有的話。如果您看一下最新的presentation，很明顯新的分支支持OpenMP 4.0，而支持4.5以及IBM將支持其Power8 CPU。這部分解釋了對Windows的不確定支持。然而，我通過github中的代碼進行搜索，並指出了Win32和Win64宏的定義。

來源

2017-02-10 14:15:41 Todd

使用LLVM/Clang在Win10上使用OpenMP的Cuda

回答

相關問題