我正在試驗fuzzywuzzy,並且在很多情況下遇到了錯誤的結果。我試圖調試並遇到get_matching_blocks()這個很難解釋的場景。未使用difflib.SequenceMatcher的行爲get_matching_blocks()
我get_matching_blocks(),則它應該返回三重元組的理解(I,J,n),其中的第一串中長度n
的在索引i
子串應與長度的子串完全匹配n
在索引j的第二個字符串中。
>>> hay = """"Find longest matching block in a[alo:ahi] and b[blo:bhi]. If isjunk was omitted or None, find_longest_match() returns (i, j, k) such that a[i:i+k] is equal to b[j:j+k], where alo <= i <= i+k <= ahi and blo <= j <= j+k <= bhi. For all (i', j', k') meeting those conditions, the additional conditions k >= k', i <= i', and if i == i', j <= j' are also met. In other words, of all maximal matching blocks, return one that starts earliest in a, and of all those maximal matching blocks that start earliest in a, return the one that starts earliest in b."""
>>> needle = "meeting those conditions"
>>> needle in hay
True
>>> sm = difflib.SequenceMatcher(None,needle,hay)
>>> sm.get_matching_blocks()
[Match(a=5, b=8, size=2), Match(a=24, b=550, size=0)]
>>>
SO爲什麼上面的代碼無法找到匹配的塊?
感謝您指出。我已經更新了這個問題。而且我知道這個事實,返回列表中的最後一個元素是一個虛擬的 – Abhijit
好吧,我的壞,在解釋器中運行它會得到相同的結果。然而,切換參數產生正確的輸出,也許這會指向某人回答 – wasyl
好的,另一個更新:這與'autojunk'參數有關。添加'False'作爲'SequenceMatcher'的第四個參數使得輸出正確 – wasyl