蟒蛇：比較兩個字符串

我想知道是否有一個庫，它會告訴我兩個字符串大約有多相似蟒蛇：比較兩個字符串

我不是在尋找特定的東西，但在這種情況下：

a = 'alex is a buff dude' 
b = 'a;exx is a buff dud'

我們可以說b和a大概有90％的相似。

有沒有可以做到這一點的圖書館？

來源

2010-08-23 l--' ' ' ' ' ' ---------' ' ' ' ' ' ' ' ' ' ' '

[文本差算法]的可能重複（http://stackoverflow.com/questions/145607/text-difference-algorithm） – tzot 2010-09-20 14:09:48

import difflib 

>>> a = 'alex is a buff dude' 
>>> b = 'a;exx is a buff dud' 
>>> difflib.SequenceMatcher(None, a, b).ratio() 

0.89473684210526316

來源

2010-08-23 21:06:18 killown

查找Levenshtein比較字符串的算法。以下是通過谷歌找到了一個隨機的實現：http://hetland.org/coding/python/levenshtein.py

來源

2010-08-23 20:34:24 viraptor

http://en.wikipedia.org/wiki/Levenshtein_distance

上有一些庫，但要知道，這是昂貴的，特別是對於更長的字符串。

你也可以想看看python的difflib：http://docs.python.org/library/difflib.html

來源

2010-08-23 20:35:39

昂貴？ difflib與半正式的Levenshtein實現相比是一個怪物。 – 2010-08-23 23:26:34

我不打算暗示difflib比較便宜 - 它只是做了一個類似的事情，儘管有點不同。 – 2010-08-24 08:37:04

另一種方法是使用時間最長的公共子。在這裏，在Daniweb一個實現我的LCS實現（這也是在difflib定義）

下面是簡單的長度只有列表作爲數據結構的版本：

def longest_common_sequence(a,b): 

    n1=len(a) 
    n2=len(b) 

    previous=[] 
    for i in range(n2): 
     previous.append(0) 

    over = 0 
    for ch1 in a: 
     left = corner = 0 
     for ch2 in b: 
      over = previous.pop(0) 
      if ch1 == ch2: 
       this = corner + 1 
      else: 
       this = over if over >= left else left 
      previous.append(this) 
      left, corner = this, over 
    return 200.0*previous.pop()/(n1+n2)

這裏是我的第二version which actualy gives the common string與deque的數據結構（也與實施例的數據使用的情況下）：

from collections import deque 

a = 'alex is a buff dude' 
b = 'a;exx is a buff dud' 

def lcs_tuple(a,b): 

    n1=len(a) 
    n2=len(b) 

    previous=deque() 
    for i in range(n2): 
     previous.append((0,'')) 

    over = (0,'') 
    for i in range(n1): 
     left = corner = (0,'') 
     for j in range(n2): 
      over = previous.popleft() 
      if a[i] == b[j]: 
       this = corner[0] + 1, corner[1]+a[i] 
      else: 
       this = max(over,left) 
      previous.append(this) 
      left, corner = this, over 
    return 200.0*this[0]/(n1+n2),this[1] 
print lcs_tuple(a,b) 

""" Output: 
(89.47368421052632, 'aex is a buff dud') 
"""

來源

2010-08-23 21:12:42

蟒蛇：比較兩個字符串

回答

相關問題