提高重複檢查效率 - Python

我是一名研究人員，使用Python來處理氣候模型輸出，以查找某些類型的風暴。我有8個大的numpy陣列（尺寸是109574 x 52 x 57）。這些數組填充1表示當天有風暴（第一維是時間），0表示沒有風暴。另外兩個維度是經度和緯度。提高重複檢查效率 - Python

我要消除這些陣列背到迴天。例如，如果第1天和第2天有暴風雨，我只想計1次風暴。如果第一天，第二天和第三天有暴風雨，我想只計算1和3共計兩次風暴，第1-4天會有兩次風暴，依此類推。我在最後使用np.sum在時間軸上統計了陣列中的1，結果發現了風暴＃。

我運行下面的代碼來實現這一點，但我面對的問題，這是非常緩慢的。因爲我將不得不爲其他數據集重複此過程，所以我想知道是否有辦法加快此過程以提高效率。我有我的代碼在下面，我會很樂意澄清任何事情。

# If there is a storm that overlaps two two-day periods, only count it once 
print("Eliminating doubles...") 
for i in range(52): 
    for j in range(57): 
     print(i,j) 
     for k in range(109573): 
      if((storms1[k,i,j]) == 1 and (storms1[k+1,i,j] == 1)): 
       storms1[k,i,j] = 0 
      if((storms2[k,i,j]) == 1 and (storms2[k+1,i,j] == 1)): 
       storms2[k,i,j] = 0 
      if((storms3[k,i,j]) == 1 and (storms3[k+1,i,j] == 1)): 
       storms3[k,i,j] = 0 
      if((storms4[k,i,j]) == 1 and (storms4[k+1,i,j] == 1)): 
       storms4[k,i,j] = 0 
      if((storms5[k,i,j]) == 1 and (storms5[k+1,i,j] == 1)): 
       storms5[k,i,j] = 0 
      if((storms6[k,i,j]) == 1 and (storms6[k+1,i,j] == 1)): 
       storms6[k,i,j] = 0 
      if((storms7[k,i,j]) == 1 and (storms7[k+1,i,j] == 1)): 
       storms7[k,i,j] = 0 
      if((storms8[k,i,j]) == 1 and (storms8[k+1,i,j] == 1)): 
       storms8[k,i,j] = 0

在有人建議用循環遍歷數組之前，爲了提出這個問題，我改變了變量名以簡化它們。

感謝您的幫助。

來源

2017-02-12 LonelyHeartsClub

這裏是一個矢量化功能，可替換的最內層循環：

def do(KK): 
    # find stretches of ones 
    switch_points = np.where(np.diff(np.r_[0, KK, 0]))[0] 
    switch_points.shape = -1, 2 
    # isolate stretches starting on odd days and create mask 
    odd_starters = switch_points[switch_points[:, 0] % 2 == 1, :] 
    odd_mask = np.zeros((KK.shape[0] + 1,), dtype=KK.dtype) 
    odd_mask[odd_starters] = 1, -1 
    odd_mask = np.add.accumulate(odd_mask[:-1]) 
    # apply global 1,0,1,0,1,0,... mask 
    KK[1::2] = 0 
    # invert stretches starting on odd days 
    KK ^= odd_mask

呼叫它從外部對環的（i和j）中：

do(storms1[:, i, j]) 
do(storms2[:, i, j]) 
etc.

它將改變陣列到位。

這應該是比循環（兩個外循環不會有所作爲）快得多。

工作原理：

它發現的起點和那些塊的端點。我們知道在每個這樣的塊中，每一個塊都必須是零。使用全局1,0,1,0,1,0，...掩碼算法每隔一天就會清零。

產生

正確的結果在塊即開始甚至幾天
塊
與所述正確圖案的塊補上十多天啓動外沒有變化

算法的最後一步是反轉這些奇怪的起始塊。

來源

2017-02-13 01:18:28

使用一維數組，模擬的第一軸的一個例子。首先，找到1的組開始。接下來，找到每個組的長度。最後，計算出活動的基礎上你的邏輯數量：

import numpy 

a = numpy.random.randint(0,2,20) 

# Add an initial 0 
a1 = numpy.r_[0, a] 

# Mark the start of each group of 1's 
d1 = numpy.diff(a1) > 0 

# Indices of the start of groups of 1's 
w1 = numpy.arange(len(d1))[d1] 

# Length of each group 
cs = numpy.cumsum(a) 
c = numpy.diff(numpy.r_[cs[w1], cs[-1]+1]) 

# Apply the counting logic 
storms = c - c//2 

print(a) 
>>> array([0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1]) 
print(c) 
>>> array([1, 2, 4, 1, 3]) 
print(storms) 
>>> array([1, 1, 2, 1, 2])

可以爲您節省更多的內存比我在這裏展示通過重用變量名不再需要後，他們等。

來源

2017-02-13 05:09:29 Benjamin

所以我想你想：

storms_in[:,i,j] = [0,0,1,1,0,1,1,1,0,1,0,1,1,1,1,0] 
storms_out[:,i,j]= [0,0,1,0,0,1,0,1,0,1,0,1,0,0,1,0]

這是不你的代碼示例是幹什麼的，但你說你想在你的第二段做。

要做到這一點，你需要兩個步驟

def storms_disc(storms): # put the whole array here, boolean-safe 
    z = np.zeros((1,) + storms.shape[1:]) # zero-pads for the ends 
    changes = np.r_[storms.astype('int8') ,z] - np.r_[z, storms.astype('int8')] #find where the weather changes 
    changes=((changes[:-1] == 1) | (changes[1:] == -1)).astype('int8') # reduce dimension 
    return ((np.r_[changes, z] - np.r_[z, changes])[:-1] == 1).astype(storms.dtype) #find the first of successive changes

該向量化的全過程，而且你只需要調用它的8倍。該astype電話是因爲減去布爾導致一個錯誤，儘管它們的價值是1和0

測試：

storms=np.random.randint(0,2,90).reshape(10,3,3) 
storms.T 

array([[[1, 0, 0, 1, 1, 1, 1, 1, 1, 0], 
     [0, 0, 1, 1, 0, 1, 1, 0, 0, 1], 
     [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]], 

     [[0, 0, 0, 1, 0, 1, 0, 0, 0, 0], 
     [0, 1, 0, 0, 1, 1, 1, 0, 0, 0], 
     [0, 1, 0, 0, 1, 0, 1, 0, 1, 1]], 

     [[0, 1, 0, 1, 0, 1, 1, 0, 0, 0], 
     [0, 1, 0, 1, 0, 1, 0, 0, 1, 1], 
     [0, 0, 0, 1, 1, 1, 0, 0, 1, 0]]], dtype=int8) 

storms_disc(storms).T 

array([[[1, 0, 0, 1, 0, 0, 0, 0, 1, 0], 
     [0, 0, 1, 0, 0, 1, 0, 0, 0, 1], 
     [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]], 

     [[0, 0, 0, 1, 0, 1, 0, 0, 0, 0], 
     [0, 1, 0, 0, 1, 0, 1, 0, 0, 0], 
     [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]], 

     [[0, 1, 0, 1, 0, 1, 0, 0, 0, 0], 
     [0, 1, 0, 1, 0, 1, 0, 0, 1, 0], 
     [0, 0, 0, 1, 0, 1, 0, 0, 1, 0]]], dtype=int8)

來源

2017-02-13 08:49:27

請注意，您可以查看（'a.view（numpy.bool）'）8位int數組作爲布爾值，因爲Numpy布爾類型也是8位。這節省了類型轉換。 – Benjamin

不知道我會在那裏做。我主要轉換爲'int8'，所以我可以減去。我想我可以在最後替換'.type（storms.dtype）'。 –

「視圖」技巧的作用也相反......但是，我讀了你的代碼太快了一點。 – Benjamin

提高重複檢查效率 - Python

回答

相關問題