版本比較二進制文件在Python

我有兩個二進制文件。他們是這個樣子，但數據較爲隨意：版本比較二進制文件在Python

文件：

FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF ...

文件B：

41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37 ...

我想什麼是調用類似：

>>> someDiffLib.diff(file_a_data, file_b_data)

並收到類似的東西：

[Match(pos=4, length=4)]

表明在這兩個文件在第4位的字節是4個字節是相同的。序列44 43 42 41不匹配，因爲它們不在每個文件的相同位置。

是否有會做差異對我來說是圖書館嗎？或者我應該只編寫循環來進行比較？

來源

2013-04-03 omghai2u

http://docs.python.org/2/library/difflib.html - 第一結果在谷歌「在python DIFF」[在python/PHP的兩個字符串之間差（的 – Andrey 2013-04-03 21:26:51

可能重複的HTTP ：//stackoverflow.com/questions/1209800/difference-between-two-strings-in-python-php） – Andrey 2013-04-03 21:27:45

@Andrey感謝，我試過了，但現在看來，'get_matching_blocks（）'不檢查字節在每個文件中位於同一位置，只是序列存在於每個文件中。否則，是的，這正是我想要的。 – omghai2u 2013-04-03 21:28:12

您可以使用itertools.groupby()這一點，這裏有一個例子：

from itertools import groupby 

# this just sets up some byte strings to use, Python 2.x version is below 
# instead of this you would use f1 = open('some_file', 'rb').read() 
f1 = bytes(int(b, 16) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split()) 
f2 = bytes(int(b, 16) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split()) 

matches = [] 
for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i]): 
    if k: 
     pos = next(g) 
     length = len(list(g)) + 1 
     matches.append((pos, length))

或如上所述使用列表理解同樣的事情：

matches = [(next(g), len(list(g))+1) 
      for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i]) 
       if k]

這裏是如果你的例子設置正在使用Python 2.x：

f1 = ''.join(chr(int(b, 16)) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split()) 
f2 = ''.join(chr(int(b, 16)) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split())

來源

2013-04-03 21:43:19

很熱。我很喜歡你在那裏做什麼。我希望能有這樣的美麗回答。 – omghai2u 2013-04-03 21:47:58

提供的itertools.groupbysolution工作正常，但它很慢。

我寫了一個非常天真的嘗試，使用numpy，並測試了它與另一個解決方案在我碰巧擁有的特定16MB文件上的差異，並且它在我的機器上快了42倍。有人熟悉numpy可能會顯着改善這一點。

import numpy as np 

def compare(path1, path2): 
    x,y = np.fromfile(path1, np.int8), np.fromfile(path2, np.int8) 
    length = min(x.size, y.size) 
    x,y = x[:length], y[:length] 

    z = np.where(x == y)[0] 
    if(z.size == 0) : return z 

    borders = np.append(np.insert(np.where(np.diff(z) != 1)[0] + 1, 0, 0), len(z)) 
    lengths = borders[1:] - borders[:-1] 
    starts = z[borders[:-1]] 
    return np.array([starts, lengths]).T

來源

2015-06-16 19:10:47 Kevin

版本比較二進制文件在Python

回答

相關問題