用「一包字」的方法計算距離

我的代碼運行但我的函數輸出總是0.0。我的代碼調用.txt文件並創建一個矩陣，其中每個.txt文件表示矩陣中的一行，並且.txt文件中的每個單詞在矩陣的相應行中都有自己的列。用「一包字」的方法計算距離

我將兩條線進行比較。我想要統計兩行聯合的每個詞出現的頻率。然而，儘管代碼運行，我得到了錯誤的結果（0.0）。

我想我可能會在我用於該功能的矩陣中出現錯誤，但矩陣看起來不錯。

奇怪的是，如果我手動創建到列表：

a = ["a", "b", "c", "d"], 
b = ["b", "c", "d", "e"]

它的工作原理，但是當我更改爲：

a = ["word 1", "word 2", "word 3", "word 4"], 
b = ["word 2","word 3","word 4","word 5",]

結果再次0.0。我很困惑！

我的代碼：

def bow_distance(a, b): 

    p = 0 

    if len(a) > len(b): 
     max_words = len(a) 
    else: 
     max_words = len(b) 

    list_words_ab = list(set(a) | set(b)) 

    len_bow_matrix = len(list_words_ab) 
    bow_matrix = numpy.zeros(shape = (3, len_bow_matrix), dtype = str) 

    while p < len_bow_matrix: 
     bow_matrix[0, p] = str(list_words_ab[p]) 
     p = p+1 

    p = 0 

    while p < len_bow_matrix: 
     bow_matrix[1, p] = a.count(bow_matrix[0, p]) 
     bow_matrix[2, p] = b.count(bow_matrix[0, p]) 
     p = p+1 

    p = 0 
    overlap = 0 

    while p < len_bow_matrix: 
     abs_difference = abs(float(bow_matrix[1, p]) - float(bow_matrix[2, p])) 
     overlap = overlap + abs_difference 
     p = p+1 

    return (overlap/2)/max_num_parts 


    # Calculate the distances 

i = 1 
j = 1 

while i < num_of_txt + 1: 

    print(i) 
    newfile = open("TXT_distance_" + str(i)+".txt", "w") 

    while j < num_of_txt + 1: 
     newfile.write(str(bow_distance(text_word_matrix[i-1], text_word_matrix[j-1])) + " ") 
     j = j+1 

    newfile.close() 
    j = 1 
    i = i+1

來源

2016-08-24 Philipp

對於第一次看到我在這裏看到兩次失敗：

a = ["a", "b", "c", "d"], <----- comma here 
b = ["b", "c", "d", "e"] 
it works, but when I change to: 

a = ["word 1", "word 2", "word 3", "word 4"], <----- and here 
b = ["word 2","word 3","word 4","word 5",] <----- and here inside the list

來源

2016-08-24 14:37:21 turkus

還有後'「字5」'需要被去除的多餘的逗號。 – Harrison

誠然，謝謝你。 – turkus

單詞5之後的逗號並不重要，因爲它可以在列表中以逗號結尾。然而，列表定義之後的逗號*（其中定義了'a'）會使'a'成爲具有單個值（即數組本身）的元組，並且可能會拋棄您的邏輯。 – Riaz

用「一包字」的方法計算距離

回答

相關問題