2017-09-25 70 views
2

所以我有兩個CSV文件,我試圖比較並獲得類似項目的結果。第一個文件,hosts.csv如下所示:Python:比較2個csv文件中的3列和輸出(如果相等)

Path Filename Size Signature 
C:\  a.txt  14kb
D:\  b.txt  99kb 678910 
C:\  c.txt  44kb 111213 

第二個文件,masterlist.csv如下所示:

Filename Signature 
b.txt  678910 
x.txt  111213 
b.txt  777777 
c.txt  999999 

正如你所看到的行不匹配和masterlist。 csv總是大於hosts.csv文件。我想要搜索的唯一部分是簽名部分。我知道這看起來像這樣:

hosts [3] == masterlist [1] 我正在尋找一個解決方案,將給我像下面的東西(基本上hosts.csv文件與新的RESULTS列) :

Path Filename Size Signature RESULTS 
C:\  a.txt  14kbNOT FOUND in masterlist 
D:\  b.txt  99kb 678910  FOUND in masterlist (row 1) 
C:\  c.txt  44kb 111213  FOUND in masterlist (row 2) 

我搜索的帖子,發現類似這樣的東西在這裏,但我不太明白它,因爲我還在學習蟒蛇。

編輯使用Python 3.5

回答

0

你可以試試這個:

import csv 
masterlist = list(csv.reader(open('masterlist.csv'))) 
host = list(csv.reader(open('host.csv'))) 
masterlist_dict = {a:b for a, b in zip(["Filename", "Signature"], masterlist)} 
final_result = [["Path", "Filename", "Size","Signature", "RESULTS"]] + 
       [[path, filename, size, signature, "NOT FOUND"] 
       if signature in masterlist_dict["Signature"] 
       else [path, filename, size, signature, 
         "FOUND (row {})".format(
         masterlist_dict["Signature"].index(signature) 
         for path, filename, size, signature in host] 
write = csv.writer(open("new_host.csv", 'a'))) 
write.writerows(final_results) 
0

使用csv.DictReadercsv.DictWriter對象解決方案:

import csv 

with open('hosts.csv', 'r') as hosts, open('masterlist.csv', 'r') as mlist, \ 
    open('result.csv', 'w', newline='') as res: 

    host_reader = csv.DictReader(hosts, delimiter=' ', skipinitialspace=True) 
    mlist_reader = csv.DictReader(mlist, delimiter=' ', skipinitialspace=True) 
    writer = csv.DictWriter(res, fieldnames=host_reader.fieldnames + ['Result'], delimiter='\t') 

    mlist_data = {r['Signature']: mlist_reader.line_num-1 for r in mlist_reader} 
    fmt = '{0}FOUND in masterlist{1}' # prepearing output format for `Result` field 
    writer.writeheader()    # writing header 

    for r in host_reader: 
     if r['Signature'] in mlist_data: 
      r['Result'] = fmt.format(""," (row "+str(mlist_data[r['Signature']])+")") 
     else: 
      r['Result'] = fmt.format("NOT ","") 
     writer.writerow(r) 

result.csv內容:

Path Filename Size Signature Result 
C:\ a.txt 14kbNOT FOUND in masterlist 
D:\ b.txt 99kb 678910 FOUND in masterlist (row 1) 
C:\ c.txt 44kb 111213 FOUND in masterlist (row 2) 
0

我總是更喜歡一個熊貓數據框來做這樣的事情,因爲它提供了一系列不同的功能來保存和編輯.csv文件。 Pandas

df = pd.DataFrame.from_csv('1.csv') 
df2 = pd.DataFrame.from_csv('2.csv') 
df['result'] = 0 
for i in xrange(df['signature'].__len__()): 
    for j in xrange(df2['signature'].__len__()): 
     if df['signature'][i] == df2['signature'][j]: 
      df.loc[i, ('result')] = 'found in \'2.csv\' at row ' + str(
       df2.signature[df2.signature == df2['signature'][j]].index.tolist()) 
      break 
df.to_csv('out.csv') 

1.csv = hosts.csv2.csv = masterlist.csv,節省了整個輸出爲out.csv。輸出看起來像:

path filename signature       result 
0 C:\ a.txt  12345        0 
1 D:\ b.txt  678910  found in '2.csv' at row [0] 
2 C:\ c.txt  111213 found in '2.csv' at row [1, 4] 

和我.csv -files如下所示。

首先1.csv

path filename signature 
0 C:\ a.txt  12345 
1 D:\ b.txt  678910 
2 C:\ c.txt  111213 

二:2.csv

filename signature 
0 b.txt  678910 
1 x.txt  111213 
2 b.txt  777777 
3 c.txt  999999 
4 b.txt  111213 

,所以我可以看到,如果有多個occurances簽名2.csv,並保存在哪裏可以找到他們。

相關問題