在Python中，是否有比較兩個文本文件的內容是否相同的簡明方式？

49

低級別的方法：

from __future__ import with_statement 
with open(filename1) as f1: 
    with open(filename2) as f2: 
     if f1.read() == f2.read(): 
     ...

高水平方式：

import filecmp 
if filecmp.cmp(filename1, filename2, shallow=False): 
    ...

來源

2008-10-31 17:50:04

+9

我糾正你的filecmp.cmp電話，因爲不存在非真淺論點，它沒有做問題所要求的。 – tzot 2008-10-31 23:11:49

+2

你是對的。 http://www.python.org/doc/2.5.2/lib/module-filecmp.html。非常感謝你。 – 2008-11-01 03:21:44

+1

btw，應該以二進制模式打開文件以確保文件可以在行分隔符中有所不同。 – newtover 2013-04-29 10:30:52

3

 

f = open(filename1, "r").read() 
f2 = open(filename2,"r").read() 
print f == f2

來源

2008-10-31 17:52:16 mmattax

+5

「嗯，我有這個8 GiB文件和我想比較的那個32 GiB文件...」 – tzot 2008-10-31 23:19:37

22

如果你打算爲連基本的效率，您可能需要首先檢查文件大小：

if os.path.getsize(filename1) == os.path.getsize(filename2): 
    if open('filename1','r').read() == open('filename2','r').read(): 
    # Files are the same.

這樣可以節省您的閱讀每兩行文件的大小並不相同，因此不能相同。

（甚至更重要的是，你可以調用出每個文件的快速MD5SUM和比較這些，但是這不是「在Python」，所以我會在這裏停下來。）

來源

2008-10-31 17:56:15 Rich

1

對於大文件你可以計算文件的散列碼MD5或SHA。

來源

2008-10-31 17:56:33 ConcernedOfTunbridgeWells

1

我會使用MD5的文件內容的散列。

import hashlib 

def checksum(f): 
    md5 = hashlib.md5() 
    md5.update(open(f).read()) 
    return md5.hexdigest() 

def is_contents_same(f1, f2): 
    return checksum(f1) == checksum(f2) 

if not is_contents_same('foo.txt', 'bar.txt'): 
    print 'The contents are not the same!'

來源

2008-10-31 18:53:52

5

因爲我不能評論別人的答案我會寫我自己的。

如果你使用md5，你肯定不能只是md5.update（f.read（）），因爲你會使用太多的內存。

def get_file_md5(f, chunk_size=8192): 
    h = hashlib.md5() 
    while True: 
     chunk = f.read(chunk_size) 
     if not chunk: 
      break 
     h.update(chunk) 
    return h.hexdigest()

來源

2008-10-31 19:06:03 user32141

7

這是一種功能樣式的文件比較功能。如果文件具有不同的大小，它立即返回False;否則，它讀取4KiB塊大小，並立即在第一差返回False：

from __future__ import with_statement 
import os 
import itertools, functools, operator 

def filecmp(filename1, filename2): 
    "Do the two files have exactly the same contents?" 
    with open(filename1, "rb") as fp1, open(filename2, "rb") as fp2: 
     if os.fstat(fp1.fileno()).st_size != os.fstat(fp2.fileno()).st_size: 
      return False # different sizes ∴ not equal 
     fp1_reader= functools.partial(fp1.read, 4096) 
     fp2_reader= functools.partial(fp2.read, 4096) 
     cmp_pairs= itertools.izip(iter(fp1_reader, ''), iter(fp2_reader, '')) 
     inequalities= itertools.starmap(operator.ne, cmp_pairs) 
     return not any(inequalities) 

if __name__ == "__main__": 
    import sys 
    print filecmp(sys.argv[1], sys.argv[2])

只是不同的看法:)

來源

2008-10-31 23:03:01 tzot

0

from __future__ import with_statement 

filename1 = "G:\\test1.TXT" 

filename2 = "G:\\test2.TXT" 


with open(filename1) as f1: 

    with open(filename2) as f2: 

     file1list = f1.read().splitlines() 

     file2list = f2.read().splitlines() 

     list1length = len(file1list) 

     list2length = len(file2list) 

     if list1length == list2length: 

      for index in range(len(file1list)): 

       if file1list[index] == file2list[index]: 

        print file1list[index] + "==" + file2list[index] 

       else:     

        print file1list[index] + "!=" + file2list[index]+" Not-Equel" 

     else: 

      print "difference inthe size of the file and number of lines"

來源

2016-12-15 17:10:53

在Python中，是否有比較兩個文本文件的內容是否相同的簡明方式？

回答

相關問題