爲numpy陣列中的每個點找到最接近的k個點

我有一個np數組，X的大小爲1000 x 1000，其中每個元素都是實數。我想查找np數組每行中每個點的5個最近點。這裏距離度量可以是abs（x-y）。我試圖做爲numpy陣列中的每個點找到最接近的k個點

for i in range(X.shape[0]): 
    knn = NearestNeighbors(n_neighbors=5) 
    knn.fit(X[i]) 
    for j in range(X.shape[1]) 
     d = knn.kneighbors(X[i,j], return_distance=False)

但是，這不適用於我，我不知道這是如何有效。有沒有解決的辦法？我見過很多比較矢量的方法，但沒有比較單個元素的方法。我知道我可以使用for循環並循環並找到最小的k，但這在計算上會很昂貴。 KD樹能爲此工作嗎？我曾嘗試類似的方法來

Finding index of nearest point in numpy arrays of x and y coordinates

但是，我不能得到這個工作。有沒有一些numpy功能，我不知道這可以做到這一點？

來源

2017-03-16 Mike El Jackson

你說的 '最近' 是什麼意思？按價值最近？什麼是「點」？ – DyZ

所以說row r = [1,10,11,15,18,16,19,18]。對於r中的每個元素，我想要找到r中具有最接近我們正在查看的元素的其他元素。所以我們看看1並找到與其最接近的k個數字。然後我們看看10，然後找到k最接近的數字，然後....然後找到與其最接近的k個數字。 –

是的，它是一個預測矩陣，所以行是人，列是項目 –

爲您的數據的每一行構造一個帶有scipy.spatial.cKDTree的kdtree。

import numpy as np 
import scipy.spatial 


def nearest_neighbors(arr, k): 
    k_lst = list(range(k + 2))[2:] # [2,3] 
    neighbors = [] 

    for row in arr: 
     # stack the data so each element is in its own row 
     data = np.vstack(row) 
     # construct a kd-tree 
     tree = scipy.spatial.cKDTree(data) 
     # find k nearest neighbors for each element of data, squeezing out the zero result (the first nearest neighbor is always itself) 
     dd, ii = tree.query(data, k=k_lst) 
     # apply an index filter on data to get the nearest neighbor elements 
     closest = data[ii].reshape(-1, k) 
     neighbors.append(closest) 
    return np.stack(neighbors) 


N = 1000 
k = 5 
A = np.random.random((N, N)) 
nearest_neighbors(A, k)

來源

2017-03-16 07:54:30 Crispin

幹得好，就是打敗我吧。我只是補充說，這比「np」的方法快6-7倍。argpartition'從元素和循環中的行之間的差異向量。（約3秒到約18秒）。我認爲scipy的後續版本具有用於跨CPU核心並行處理的'tree.query'函數的'n_jobs'參數。我的版本沒有這個參數，但也可能會提升性能。 –

我不確定你想要最終的結果。但是這絕對會讓你得到你所需要的。

np.random.seed([3,1415]) 
X = np.random.rand(1000, 1000)

抓鬥上三角形索引來跟蹤每行

x1, x2 = np.triu_indices(X.shape[1], 1)

點的每一種組合生成的所有距離的陣列

d = np.abs(X[:, x1] - X[:, x2])

尋找最近5的每一行

tpos = np.argpartition(d, 5)[:, :5]

然後x1[tpos]給出最近對中第一個點的行位置，而x2[tpos]給出最接近對的第二個位置。

來源

2017-03-16 07:45:54 piRSquared

這裏是一個argsort ING解決方案，致力採取簡單的度量的優勢：

def nn(A, k): 
    out = np.zeros((A.shape[0], A.shape[1] + 2*k), dtype=int) 
    out[:, k:-k] = np.argsort(A, axis=-1) 
    out[:, :k] = out[:, -k-1, None] 
    out[:, -k:] = out[:, k, None] 
    strd = stride_tricks.as_strided(
     out, strides=out.strides + (out.strides[-1],), shape=A.shape + (2*k+1,)) 
    delta = A[np.arange(A.shape[0])[:, None, None], strd] 
    delta -= delta[..., k, None] 
    delta = np.abs(delta) 
    s = np.argpartition(delta,(0, k), axis = -1)[..., 1:k+1] 
    inds = tuple(np.ogrid[:strd.shape[0], :strd.shape[1], :0][:2]) 
    res = np.empty(A.shape + (k,), dtype=int) 
    res[np.arange(strd.shape[0])[:, None, None], out[:, k:-k, None], 
     np.arange(k)[None, None, :]] = strd[inds + (s,)] 
    return res 

N = 1000 
k = 5 
r = 10 

A = np.random.random((N, N)) 
# crude test 
print(np.abs(A[np.arange(N)[:, None, None], res]-A[..., None]).mean()) 
# timings 
print(timeit(lambda: nn(A, k), number=r)/r)

輸出：

# 0.00150537172454 
# 0.4567880852999224

來源

2017-03-16 10:32:17

爲numpy陣列中的每個點找到最接近的k個點

回答

相關問題