2016-08-15 111 views
1

所以我有兩個csv文件。 Book1.csv有更多的數據比similarities.csv所以我想拔出排在Book1.csvsimilarities.csv發生這裏是我到目前爲止比較兩個csv文件中的內容

with open('Book1.csv', 'rb') as csvMasterForDiff: 
     with open('similarities.csv', 'rb') as csvSlaveForDiff: 
      masterReaderDiff = csv.reader(csvMasterForDiff) 
      slaveReaderDiff = csv.reader(csvSlaveForDiff)   

      testNotInCount = 0 
      testInCount = 0 
      for row in masterReaderDiff: 
       if row not in slaveReaderDiff: 
        testNotInCount = testNotInCount + 1 
       else : 
        testInCount = testInCount + 1 


print('Not in file: '+ str(testNotInCount)) 
print('Exists in file: '+ str(testInCount)) 

然而,結果是

Not in file: 2093 
Exists in file: 0 

我知道這是不正確的,因爲至少Book1.csv的前16個條目不存在於similarities.csv不是全部。我究竟做錯了什麼?

回答

1

csv.reader一個對象是一個迭代,這意味着你只能通過迭代它一次。您應該使用清單/套圍堵檢查,例如:

slave_rows = set(slaveReaderDiff) 

for row in masterReaderDiff: 
    if row not in slave_rows: 
     testNotInCount += 1 
    else: 
     testInCount += 1 
0

將其轉換爲sets後,你可以做很多set相關&有益的工作,而無需編寫太多的代碼。

slave_rows = set(slaveReaderDiff) 
master_rows = set(masterReaderDiff) 

master_minus_slave_rows = master_rows - slave_rows 
common_rows = master_rows & slave_rows 

print('Not in file: '+ str(len(master_minus_slave_rows))) 
print('Exists in file: '+ str(len(common_rows))) 

這裏有各種set operations,你可以做。