numpy數組設置兩個值之間的值，快速

一直在尋找這個問題的解決方案一段時間，但似乎無法找到任何東西。numpy數組設置兩個值之間的值，快速

例如，我有

[ 0, 0, 2, 3, 2, 4, 3, 4, 0, 0, -2, -1, -4, -2, -1, -3, -4, 0, 2, 3, -2, -1, 0]

的numpy的陣列我想達到的目標是產生另一個數組，表示一對數字之間的所有元素，讓2之間說-2這裏。因此，我想獲得的數組這樣

[ 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0]

通知任何2或-2的對之間（2，-2）將被忽略。任何簡單的方法是用for循環遍歷每個元素，並確定第一次出現2，然後將之後的所有內容都設置爲1，直到您擊中-2並開始再次查找下一個2。

但是我希望這個過程更快，因爲我在numpy數組中有超過1000個元素。而這個過程需要進行很多次。你們知道解決這個問題的優雅方法嗎？提前致謝！

來源

2016-02-19 FiniteElement

範圍從'[2，2，-2]'開始的位置？ –

第一個'2'，從例子中可以清楚看到：'2,3,2，...' –

處理使用向量化操作的東西。還沒有得到它，但檢查了這一點：http://stackoverflow.com/questions/28563711/make-a-numpy-array-monotonic-without-a-python-loop –

很不錯的一個問題！在這篇文章中列出的是一個矢量化的解決方案（希望內嵌的評論將有助於解釋它背後的邏輯）。我假設A作爲輸入數組，T1，T2作爲開始和停止觸發器。

def setones_between_triggers(A,T1,T2):  

    # Get start and stop indices corresponding to rising and falling triggers 
    start = np.where(A==T1)[0] 
    stop = np.where(A==T2)[0] 

    # Take care of boundary conditions for np.searchsorted to work 
    if (stop[-1] < start[-1]) & (start[-1] != A.size-1): 
     stop = np.append(stop,A.size-1) 

    # This is where the magic happens. 
    # Validate (filter out) the triggers based on the set conditions : 
    # 1. See if there are more than one stop indices between two start indices. 
    # If so, use the first one and rejecting all others in that in-between space. 
    # 2. Repeat the same check for start, but use the validated start indices. 

    # First off, take care of out-of-bound cases for proper indexing 
    stop_valid_idx = np.unique(np.searchsorted(stop,start,'right')) 
    stop_valid_idx = stop_valid_idx[stop_valid_idx < stop.size] 

    stop_valid = stop[stop_valid_idx] 
    _,idx = np.unique(np.searchsorted(stop_valid,start,'left'),return_index=True) 
    start_valid = start[idx] 

    # Create shifts array (array filled with zeros, unless triggered by T1 and T2 
    # for which we have +1 and -1 as triggers). 
    shifts = np.zeros(A.size,dtype=int) 
    shifts[start_valid] = 1 
    shifts[stop_valid] = -1 

    # Perform cumm. summation that would almost give us the desired output 
    out = shifts.cumsum() 

    # For a worst case when we have two groups of (T1,T2) adjacent to each other, 
    # set the negative trigger position as 1 as well 
    out[stop_valid] = 1  
    return out

樣品運行

最初的樣本的情況下：

In [1589]: A 
Out[1589]: 
array([ 0, 0, 2, 3, 2, 4, 3, 4, 0, 0, -2, -1, -4, -2, -1, -3, -4, 
     0, 2, 3, -2, -1, 0]) 

In [1590]: setones_between_triggers(A,2,-2) 
Out[1590]: array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0])

最壞情況＃1（相鄰(2,-2)基團）：

In [1595]: A 
Out[1595]: 
array([-2, 2, 0, 2, -2, 2, 2, 2, 4, -2, 0, -2, -2, -4, -2, -1, 2, 
     -4, 0, 2, 3, -2, -2, 0]) 

In [1596]: setones_between_triggers(A,2,-2) 
Out[1596]: 
array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 
     0], dtype=int32)

最壞情況＃2（2無任何-2直到結束）：

In [1603]: A 
Out[1603]: 
array([-2, 2, 0, 2, -2, 2, 2, 2, 4, -2, 0, -2, -2, -4, -2, -1, -2, 
     -4, 0, 2, 3, 5, 6, 0]) 

In [1604]: setones_between_triggers(A,2,-2) 
Out[1604]: 
array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 
     1], dtype=int32)

來源

2016-02-19 18:28:09 Divakar

啊。我不知道「搜索」。真棒！ –

我真的是要實現我自己的ufunc來做到這一點:-)。 –

@MadPhysicist是的，searchsorted是最好的工具之一:) – Divakar

在這一點上，我已經嘗試了幾件事情，並且需要跟蹤開始/結束標記的狀態，這使得我嘗試的更聰明的事情比我用作檢查的愚蠢迭代方法慢：

for _ in xrange(1000): 
    a = np.random.choice(np.arange(-5, 6), 2000) 
    found2 = False 
    l = [] 
    for el in a: 
     if el == 2: 
      found2 = True 
     l.append(1 if found2 else 0) 
     if el == -2: 
      found2 = False 
    l = np.array(l)

來源

2016-02-19 16:50:52

迭代通過數組真的太慢了嗎？

def between_vals(x, val1, val2): 
    out = np.zeros(x.shape, dtype = int) 
    in_range = False 
    for i, v in enumerate(x): 
     if v == val1 and not in_range: 
      in_range = True 
     if in_range: 
      out[i] = 1 
     if v == val2 and in_range: 
      in_range = False 
    return out

我和@Randy C是同一條船：沒有別的我試過的比這更快。

來源

2016-02-19 16:57:17

理想情況下，使用矢量化的方法考慮到我們正在查看的數據的大小，要快得多且有利... – FiniteElement

假設你有一個巨大的數據集，我更願意對這兩個邊界進行一對初始搜索，然後在這些指數上使用for-loop進行驗證。

def between_pairs(x, b1, b2): 
    # output vector 
    out = np.zeros_like(x) 

    # reversed list of indices for possible rising and trailing edges 
    rise_edges = list(np.argwhere(x==b1)[::-1,0]) 
    trail_edges = list(np.argwhere(x==b2)[::-1,0]) 

    # determine the rising trailing edge pairs 
    rt_pairs = [] 
    t = None 
    # look for the next rising edge after the previous trailing edge 
    while rise_edges: 
     r = rise_edges.pop() 
     if t is not None and r < t: 
      continue 

     # look for the next trailing edge after previous rising edge 
     while trail_edges: 
      t = trail_edges.pop() 
      if t > r: 
       rt_pairs.append((r, t)) 
       break 

    # use the rising, trailing pairs for updating d 
    for rt in rt_pairs: 
     out[rt[0]:rt[1]+1] = 1 
    return out 
# Example 
a = np.array([0, 0, 2, 3, 2, 4, 3, 4, 0, 0, -2, -1, -4, -2, -1, -3, -4, 
     0, 2, 3, -2, -1, 0]) 
d = between_pairs(a , 2, -2) 
print repr(d) 

## -- End pasted text -- 
array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0])

我沒有與@CactusWoman

def between_vals(x, val1, val2): 
    out = np.zeros(x.shape, dtype = int) 
    in_range = False 
    for i, v in enumerate(x): 
     if v == val1 and not in_range: 
      in_range = True 
     if in_range: 
      out[i] = 1 
     if v == val2 and in_range: 
      in_range = False 
    return out

給出的備選答案的速度比較我發現下面的

In [59]: a = np.random.choice(np.arange(-5, 6), 2000) 

In [60]: %timeit between_vals(a, 2, -2) 
1000 loops, best of 3: 681 µs per loop 

In [61]: %timeit between_pairs(a, 2, -2) 
1000 loops, best of 3: 182 µs per loop

和小得多的數據集，

In [72]: a = np.random.choice(np.arange(-5, 6), 50) 

In [73]: %timeit between_vals(a, 2, -2) 
10000 loops, best of 3: 17 µs per loop 

In [74]: %timeit between_pairs(a, 2, -2) 
10000 loops, best of 3: 34.7 µs per loop

因此這一切都取決於你的數據集大小。

來源

2016-02-19 18:01:55 hashmuke

。這比簡單的for循環要好得多。贊成使用包圍邊緣的想法。 – FiniteElement

@FiniteElement。這個答案已經過時了，因爲實際上有一個矢量化的解決方案。 –

@Mad Physicist，載體解決方案在此之後發佈。 – FiniteElement

numpy數組設置兩個值之間的值，快速

回答

相關問題