2016-08-24 44 views
0

我的代碼運行但我的函數輸出總是0.0。我的代碼調用.txt文件並創建一個矩陣,其中每個.txt文件表示矩陣中的一行,並且.txt文件中的每個單詞在矩陣的相應行中都有自己的列。用「一包字」的方法計算距離

我將兩條線進行比較。我想要統計兩行聯合的每個詞出現的頻率。然而,儘管代碼運行,我得到了錯誤的結果(0.0)。

我想我可能會在我用於該功能的矩陣中出現錯誤,但矩陣看起來不錯。

奇怪的是,如果我手動創建到列表:

a = ["a", "b", "c", "d"], 
b = ["b", "c", "d", "e"] 

它的工作原理,但是當我更改爲:

a = ["word 1", "word 2", "word 3", "word 4"], 
b = ["word 2","word 3","word 4","word 5",] 

結果再次0.0。我很困惑!

我的代碼:

def bow_distance(a, b): 

    p = 0 

    if len(a) > len(b): 
     max_words = len(a) 
    else: 
     max_words = len(b) 

    list_words_ab = list(set(a) | set(b)) 

    len_bow_matrix = len(list_words_ab) 
    bow_matrix = numpy.zeros(shape = (3, len_bow_matrix), dtype = str) 

    while p < len_bow_matrix: 
     bow_matrix[0, p] = str(list_words_ab[p]) 
     p = p+1 

    p = 0 

    while p < len_bow_matrix: 
     bow_matrix[1, p] = a.count(bow_matrix[0, p]) 
     bow_matrix[2, p] = b.count(bow_matrix[0, p]) 
     p = p+1 

    p = 0 
    overlap = 0 

    while p < len_bow_matrix: 
     abs_difference = abs(float(bow_matrix[1, p]) - float(bow_matrix[2, p])) 
     overlap = overlap + abs_difference 
     p = p+1 

    return (overlap/2)/max_num_parts 


    # Calculate the distances 

i = 1 
j = 1 

while i < num_of_txt + 1: 

    print(i) 
    newfile = open("TXT_distance_" + str(i)+".txt", "w") 

    while j < num_of_txt + 1: 
     newfile.write(str(bow_distance(text_word_matrix[i-1], text_word_matrix[j-1])) + " ") 
     j = j+1 

    newfile.close() 
    j = 1 
    i = i+1 

回答

0

對於第一次看到我在這裏看到兩次失敗:

a = ["a", "b", "c", "d"], <----- comma here 
b = ["b", "c", "d", "e"] 
it works, but when I change to: 

a = ["word 1", "word 2", "word 3", "word 4"], <----- and here 
b = ["word 2","word 3","word 4","word 5",] <----- and here inside the list 
+0

還有後'「字5」'需要被去除的多餘的逗號。 – Harrison

+0

誠然,謝謝你。 – turkus

+0

單詞5之後的逗號並不重要,因爲它可以在列表中以逗號結尾。然而,列表定義之後的逗號*(其中定義了'a')會使'a'成爲具有單個值(即數組本身)的元組,並且可能會拋棄您的邏輯。 – Riaz