Numpy - 使用numpy.from函數構造Jaro（或Levenshtein）距離的矩陣

我正在做一些文本分析，並且作爲它的一部分，我需要獲得特定列表中所有單詞之間的Jaro距離矩陣（所以兩兩成對距離矩陣）像這樣：Numpy - 使用numpy.from函數構造Jaro（或Levenshtein）距離的矩陣

 │CHEESE CHORES GEESE GLOVES 
───────┼─────────────────────────── 
CHEESE │ 0 0.222 0.177 0.444  
CHORES │0.222  0 0.422 0.333 
GEESE │0.177 0.422  0 0.300 
GLOVES │0.444 0.333 0.300  0

所以，我試圖使用numpy.fromfunction構造它。根據文檔和示例，它將座標傳遞給函數，獲取其結果，構造結果矩陣。

我嘗試以下方法：

from jellyfish import jaro_distance 

def distance(i, j): 
    return 1 - jaro_distance(feature_dict[i], feature_dict[j]) 

feature_dict = 'CHEESE CHORES GEESE GLOVES'.split() 
distance_matrix = np.fromfunction(distance, shape=(len(feature_dict),len(feature_dict)))

注意：jaro_distance只接受2串並返回一個float。

而且我得到了一個錯誤：

File "<pyshell#26>", line 4, in distance 
    return 1 - jaro_distance(feature_dict[i], feature_dict[j]) 
TypeError: only integer arrays with one element can be converted to an index

我加print(i)，print(j)到函數的開始，我發現的，而不是真實的，協調一個奇怪的傳遞：

[[ 0. 0. 0. 0.] 
[ 1. 1. 1. 1.] 
[ 2. 2. 2. 2.] 
[ 3. 3. 3. 3.]] 
[[ 0. 1. 2. 3.] 
[ 0. 1. 2. 3.] 
[ 0. 1. 2. 3.] 
[ 0. 1. 2. 3.]]

爲什麼？在numpy網站上的examples清楚地表明，只有兩個整數通過，沒有別的。

我試圖用lambda功能精確地再現他們的榜樣，但我得到完全相同的錯誤：

distance_matrix = np.fromfunction(lambda i, j: 1 - jaro_distance(feature_dict[i], feature_dict[j]), shape=(len(feature_dict),len(feature_dict)))

任何幫助表示讚賞 - 我想我誤解了它在某種程度上。

來源

2015-04-22 Maxim Haytovich

你能把這個變成一個[完整的例子]（http://stackoverflow.com/help/mcve）嗎？什麼是'feature_dict'？ 'jaro_distance（）'的調用簽名是什麼？ –

這是一個完整的例子，我相信。特徵字典按照代碼中提供的內容生成：feature_dict ='CHEESE CHORES GEESE GLOVES'.split（） –

jaro_distance只是得到2個字符串並返回一個浮點數。這不是我的功能，它是由水母提供的 –

正如@xnx所建議的那樣，我調查了question，發現fromfunc不是一個接一個地傳遞座標，而是實際上同時傳遞了所有的索引。這意味着，如果陣列的外形將是（2,2）numpy的將不執行f(0,0), f(0,1), f(1,0), f(1,1)，而是將執行：

f([[0., 0.], [1., 1.]], [[0., 1.], [0., 1.]])

不過貌似我的具體功能，可以矢量化會產生所需的結果。因此，實現所需的代碼如下：

from jellyfish import jaro_distance 
import numpy 
def distance(i, j): 
    return 1 - jaro_distance(feature_dict[i], feature_dict[j]) 

feature_dict = 'CHEESE CHORES GEESE GLOVES'.split() 

funcProxy = np.vectorize(distance) 

distance_matrix = np.fromfunction(funcProxy, shape=(len(feature_dict),len(feature_dict)))

它工作正常。

來源

2015-04-22 19:44:25

Numpy - 使用numpy.from函數構造Jaro（或Levenshtein）距離的矩陣

回答

相關問題