的Python/numpy的 - 快速查找索引數組中的最近的某個值

我有值，T的陣列，即總是遞增次序（但不總是均勻間隔的）。我還有另一個值x。我需要在t中找到索引，使得t [index]最接近x。該函數必須爲x < t.min（）返回零，併爲x> t.max（）返回最大索引（或-1）。的Python/numpy的 - 快速查找索引數組中的最近的某個值

我已經寫了兩個函數來做到這一點。第一個，f1，在這個簡單的時間測試中更快。但我喜歡第二個只是一條線。這個計算將在一個大陣列上完成，可能每秒多次。

任何人都可以拿出來與可比的時機一些其他的功能，第一，但與清潔尋找代碼？第一個速度怎麼樣（速度是最重要的）？

謝謝！

代碼：

import numpy as np 
import timeit 

t = np.arange(10,100000)   # Not always uniform, but in increasing order 
x = np.random.uniform(10,100000) # Some value to find within t 

def f1(t, x): 
    ind = np.searchsorted(t, x) # Get index to preserve order 
    ind = min(len(t)-1, ind)  # In case x > max(t) 
    ind = max(1, ind)    # In case x < min(t) 
    if x < (t[ind-1] + t[ind])/2.0: # Closer to the smaller number 
     ind = ind-1 
    return ind 

def f2(t, x): 
    return np.abs(t-x).argmin() 

print t,   '\n', x,   '\n' 
print f1(t, x), '\n', f2(t, x), '\n' 
print t[f1(t, x)], '\n', t[f2(t, x)], '\n' 

runs = 1000 
time = timeit.Timer('f1(t, x)', 'from __main__ import f1, t, x') 
print round(time.timeit(runs), 6) 

time = timeit.Timer('f2(t, x)', 'from __main__ import f2, t, x') 
print round(time.timeit(runs), 6)

來源

2011-05-19 Scott B

由於您的數組進行排序，嘗試二進制搜索。看到這個問題的答案︰http://stackoverflow.com/questions/212358/binary-search-in-python – payne 2011-05-19 23:04:38

我只是離開工作，但想看看這個稍後。我認爲，一旦你測試了x max（t），你可能會通過短路改善你的第一個功能，但我還沒有機會測試它。 – 2011-05-19 23:05:51

這似乎更快（對我來說，Python的3.2-win32的，numpy的1.6.0慢）：

from bisect import bisect_left 
def f3(t, x): 
    i = bisect_left(t, x) 
    if t[i] - x > 0.5: 
     i-=1 
    return i

輸出：

[ 10 11 12 ..., 99997 99998 99999] 
37854.22200356027 
37844 
37844 
37844 
37854 
37854 
37854 
f1 0.332725 
f2 1.387974 
f3 0.085864

來源

2011-05-19 23:57:37 Kabie

雅，快得多。但是，它需要稍微調整，以考慮x t.max的特殊情況 – 2011-05-20 17:42:16

使用searchsorted：

t = np.arange(10,100000)   # Not always uniform, but in increasing order 
x = np.random.uniform(10,100000) 

print t.searchsorted(x)

編輯：

是啊，我看這是你在F1做什麼。也許下面的f3比f1更容易閱讀。

def f3(t, x): 
    ind = t.searchsorted(x) 
    if ind == len(t): 
     return ind - 1 # x > max(t) 
    elif ind == 0: 
     return 0 
    before = ind-1 
    if x-t[before] < t[ind]-x: 
     ind -= 1 
    return ind

來源

2011-05-19 23:15:41

這只是給出了插入數字的索引，並不一定是最接近的索引。這就是爲什麼他的F1功能需要額外的步驟。 – Dan 2011-05-19 23:18:45

啊，是的，我沒有看到在f1搜索排序。抱歉。 – 2011-05-19 23:29:02

np.searchsorted是二元搜索（每次將數組拆分成一半）。所以你必須以一種方式實現它，它返回的最後一個值小於x而不是返回零。

看看這個算法（從here）：

def binary_search(a, x): 
    lo=0 
    hi = len(a) 
    while lo < hi: 
     mid = (lo+hi)//2 
     midval = a[mid] 
     if midval < x: 
      lo = mid+1 
     elif midval > x: 
      hi = mid 
     else: 
      return mid 
    return lo-1 if lo > 0 else 0

剛剛更換的最後一行（是return -1）。也改變了論據。

由於迴路用Python編寫的，它可以是比第一個...（不基準）

來源

2011-05-19 23:42:07 JBernardo

又見[平分模塊（http://docs.python.org/library/bisect.html） – RecursivelyIronic 2011-05-19 23:52:12

的Python/numpy的 - 快速查找索引數組中的最近的某個值

回答

相關問題