從推力到arrayfire - gfor的用法？

-2

我正試圖用arrayfire替換一些推送電話來檢查性能。從推力到arrayfire - gfor的用法？

我不知道我是否正確使用陣列火災，因爲我採取的結果根本不匹配。

所以，例如我使用的推力代碼：

cudaMalloc((void**) &devRow, N * sizeof(float)); 
...//devRow is filled 

thrust::device_ptr<float> SlBegin(devRow); 
for (int i = 0; i < N; i++, SlBegin += PerSlElmts) 
{ 
    thrust::inclusive_scan(SlBegin, SlBegin + PerSlElmts, SlBegin); 
} 

cudaMemcpy(theRow, devRow, N * sizeof(float), cudaMemcpyDeviceToHost); 
//use theRow...

Arrayfire：

af::array SlBegin(N , devRow); 
for (int i = 0;i < N; i++,SlBegin += PerSlElmts) 
{ 
    accum(SlBegin); 
} 

cudaMemcpy(theRow, devRow, N * sizeof(float), cudaMemcpyDeviceToHost); 
//use theRow..

我不知道如何處理arrayfire副本：af::array SlBegin(N , devRow);。在推，我們有設備從devRow指向SlBegin的指針，但在arrayfire ..？

另外，我想問一下使用gfor。在arrayfire webpage中，它指出

不要直接使用此函數;請參閱GFOR：並行For-Loops。

然後對GFOR：

GFOR在ArrayFire當前版本禁用

所以，我們不能用GFOR？

--------- UPDATE ---------------------------

我有一個小跑步例如其示出了不同的結果：

#include <stdio.h> 
#include <stdlib.h> 

#include <cuda.h> 
#include <cuda_runtime.h> 
#include <curand_kernel.h> 

#include "arrayfire.h" 

#include <thrust/scan.h> 
#include <thrust/host_vector.h> 
#include <thrust/device_vector.h> 

__global__ void Kernel(const int N ,float * const devRow) 
{ 

    int i = threadIdx.x; 
    if (i < N) 
     devRow[ i ] = i; 

} 

int main(){ 

    int N = 6; 
    int Slices = 2; 
    int PerSlElmts = 3; 

    float * theRow = (float*) malloc (N * sizeof(float)); 

    for (int i = 0; i < N; i ++) 
     theRow[ i ] = 0; 

    // raw pointer to device memory 
    float * devRow; 
    cudaMalloc((void **) &devRow, N * sizeof(float)); 

    Kernel<<< 1,N >>>(N , devRow); 
    cudaDeviceSynchronize(); 

    // wrap raw pointer with a device_ptr 
    thrust::device_ptr<float> SlBegin(devRow); 

    for (int i = 0; i < Slices; i++ , SlBegin += PerSlElmts) 
     thrust::inclusive_scan(SlBegin, SlBegin + PerSlElmts , SlBegin); 

    cudaMemcpy(theRow, devRow, N * sizeof(float), cudaMemcpyDeviceToHost); 

    for (int i = 0; i < N; i++) 
     printf("\n Thrust accum : %f",theRow[ i ]); 


    //--------------------------------------------------------------------// 
    Kernel<<< 1,N >>>(N , devRow); 
    cudaDeviceSynchronize(); 

    af::array SlBeginFire(N, devRow); 

    for (int i = 0; i < Slices; i++ , SlBeginFire += PerSlElmts) 
     af::accum(SlBeginFire); 

    SlBeginFire.host(theRow); 

    for (int i = 0; i < N; i++) 
      printf("\n Arrayfire accum : %f",theRow[ i ]); 

    cudaFree(devRow); 
    free(theRow); 


    return 0; 

}

來源

2015-03-03 George

看起來你正在試圖運行一個逐列（0級暗淡在ArrayFire）的2D陣列上掃描。下面是一些代碼，你可以使用：

af::array SlBegin(N, devRow); 
af::array result = accum(SlBegin, 0);

下面是一個樣本輸出

A [5 3 1 1] 
0.7402  0.4464  0.7762 
0.9210  0.6673  0.2948 
0.0390  0.1099  0.7140 
0.9690  0.4702  0.3585 
0.9251  0.5132  0.6814 

accum(A, 0) [5 3 1 1] 
0.7402  0.4464  0.7762 
1.6612  1.1137  1.0709 
1.7002  1.2236  1.7850 
2.6692  1.6938  2.1435 
3.5943  2.2070  2.8249

這在運行和包容性的掃描每列獨立。

至於gfor，它已被添加到ArrayFire的開源版本。由於此代碼庫仍然是測試版，所以改進和修復的速度非常快。因此，請在我們的github頁面上留意一下。

來源

2015-03-03 14:17:44 shehzan

：您好，非常感謝您的幫助。請您寫出類似的代碼片段，因爲我有它嗎？是否有必要使用另一個數組（結果）來回或循環？循環後，我必須做SlBegin .host（theRow）;？你能否給我提供這個例子？我使用的是一維，而不是二維數組。 – George 2015-03-03 14:26:59

：我更新了代碼。 – George 2015-03-03 15:01:23

您顯示的代碼表明ArrayFire的使用錯誤。如果您想討論您的代碼，我建議您在ArrayFire郵件列表https://groups.google.com/forum/#!forum/arrayfire-users上發帖。 – shehzan 2015-03-03 18:14:22

從推力到arrayfire - gfor的用法？

回答

相關問題