我想加快我的代碼使用cython。將代碼翻譯成Python的cython後,我看到我沒有獲得任何加速。我認爲問題的根源在於我將numpy數組轉換爲cython時表現不佳。Cython:緩慢的numpy陣列
我已經想出了一個非常簡單的程序,以顯示這一點:
############### test.pyx #################
import numpy as np
cimport numpy as np
cimport cython
def func1(long N):
cdef double sum1,sum2,sum3
cdef long i
sum1 = 0.0
sum2 = 0.0
sum3 = 0.0
for i in range(N):
sum1 += i
sum2 += 2.0*i
sum3 += 3.0*i
return sum1,sum2,sum3
def func2(long N):
cdef np.ndarray[np.float64_t,ndim=1] sum_arr
cdef long i
sum_arr = np.zeros(3,dtype=np.float64)
for i in range(N):
sum_arr[0] += i
sum_arr[1] += 2.0*i
sum_arr[2] += 3.0*i
return sum_arr
def func3(long N):
cdef double sum_arr[3]
cdef long i
sum_arr[0] = 0.0
sum_arr[1] = 0.0
sum_arr[2] = 0.0
for i in range(N):
sum_arr[0] += i
sum_arr[1] += 2.0*i
sum_arr[2] += 3.0*i
return sum_arr
##########################################
################## test.py ###############
import time
import test as test
N = 1000000000
for i in xrange(10):
start = time.time()
sum1,sum2,sum3 = test.func1(N)
print 'Time taken = %.3f'%(time.time()-start)
print '\n'
for i in xrange(10):
start = time.time()
sum_arr = test.func2(N)
print 'Time taken = %.3f'%(time.time()-start)
print '\n'
for i in xrange(10):
start = time.time()
sum_arr = test.func3(N)
print 'Time taken = %.3f'%(time.time()-start)
############################################
而且從蟒蛇test.py我得到:
Time taken = 1.445
Time taken = 1.433
Time taken = 1.434
Time taken = 1.428
Time taken = 1.449
Time taken = 1.425
Time taken = 1.421
Time taken = 1.451
Time taken = 1.483
Time taken = 1.418
Time taken = 2.623
Time taken = 2.603
Time taken = 2.977
Time taken = 3.237
Time taken = 2.748
Time taken = 2.798
Time taken = 2.811
Time taken = 2.783
Time taken = 2.585
Time taken = 2.595
Time taken = 1.503
Time taken = 1.529
Time taken = 1.509
Time taken = 1.543
Time taken = 1.427
Time taken = 1.425
Time taken = 1.423
Time taken = 1.415
Time taken = 1.414
Time taken = 1.418
我的問題是:爲什麼FUNC2幾乎是2倍速度較慢比func1和func3?
有沒有辦法改善這一點?
謝謝!
######## UPDATE我真正的問題如下。我正在調用接受3D數組的函數(比如P [i,j,k])。函數將遍歷每個元素並計算幾個量:一個數量取決於該位置數組的值(比如A = f(P [i,j,k])),另一個量只取決於位置(B = g(i,j,k))。示意圖如下:
for i in xrange(N):
corr1 = h(i,val)
for j in xrange(N):
corr2 = h(j,val)
for k in xrange(N):
corr3 = h(k,val)
A = f(P[i,j,k])
B = g(i,j,k)
Arr[B] += A*corr1*corr2*corr3
其中val是由數字表示的3D數組的屬性。這個數字對於不同的領域可能是不同的。
由於我必須對許多3D數組進行這種操作,我認爲如果我創建一個接受許多不同輸入3D數組的新例程會更好,從而使數組的數量未知。這個想法是因爲B在所有數組中都是完全相同的,所以我可以避免爲每個數組計算它,只計算一次。問題是,CORR1,CORR2,corr3上面會成爲數組:
如果我有一些3D陣列等於num_3D_arrays我做的事情爲:
for i in xrange(N):
for p in xrange(num_3D_arrays):
corr1[p] = h(i,val[p])
for j in xrange(N):
for p in xrange(num_3D_arrays):
corr2[p] = h(j,val[p])
for k in xrange(N):
for p in xrange(num_3D_arrays):
corr3[p] = h(k,val[p])
B = g(i,j,k)
for p in xrange(num_3D_arrays):
A[p] = f(P[i,j,k])
Arr[p,B] += A[p]*corr1[p]*corr2[p]*corr3[p]
所以VAL,我改變從標量到數組的變量corr1,corr2,corr3和A正在消除我期望避免執行大循環的性能。
#
代碼範圍(N): sum_arr [0] + = i sum_arr [1] + = 2.0 * i sum_arr [2] + = 3.0 * i'忽略numpy擅長的所有內容。 Numpy不是很快,因爲你可以快速訪問索引,但是因爲它可以快速進行數字操作。但不是那樣。我建議讀入numpy –
我想這很難讓它更快。因爲假如你固執地使用'numpy',你必須在該循環中創建numpy數組,並執行np.sum(),但創建numpy數組可能是該代碼片段中最慢的事情。我還建議分別檢查每條線,而不是這個簡單的時間。 ** [一些閱讀分析](http://stackoverflow.com/questions/582336/how-can-you-profile-a-script)** –
好的謝謝!在我的情況下,問題是我不能像func1那樣定義單個變量,但是我需要定義一個我不知道先驗的大小的數組。有沒有不同的方式來做到這一點比使用numpy數組? – Francisco