從CSV

2016-08-04 45 views
1

刪除重複行我有一個CSV文件看起來像這樣從CSV

red,75,right 
red,344,right 
green,3,center 
yellow,3222,right 
blue,9,center 
black,123,left 
white,68,right 
green,47,left 
purple,48,left 
purple,988,right 
pink,2677,left 
white,34,right 

我使用Python,我試圖刪除具有重複的單元格1.我知道我可以做到這一點使用類似的行熊貓,但我正在嘗試使用標準的Python CSV庫。

預期的結果是...

red,75,right 
green,3,center 
yellow,3222,right 
blue,9,center 
black,123,left 
white,68,right 
purple,988,right 
pink,2677,left 

任何人都有一個例子嗎?

+3

我去掉熊貓標記,因爲你不希望大熊貓的解決方案。 – ayhan

+0

預期的輸出添加到原來的帖子 – fightstarr20

回答

1

您只需使用一本字典,其中顏色是關鍵和值的行。如果顏色已經存在於字典中,請忽略該顏色,否則將其添加並將該行寫入新的csv文件。

import csv 

file_in = 'input_file.csv' 
file_out = 'output_file.csv' 
with open(file_in, 'rb') as fin, open(file_out, 'wb') as fout: 
    reader = csv.reader(fin) 
    writer = csv.writer(fout) 
    d = {} 
    for row in reader: 
     color = row[0] 
     if color not in d: 
      d[color] = row 
      writer.writerow(row) 
result = d.values() 

result 
# Output: 
# [['blue', '9', 'center'], 
# ['pink', '2677', 'left'], 
# ['purple', '48', 'left'], 
# ['yellow', '3222', 'right'], 
# ['black', '123', 'left'], 
# ['green', '3', 'center'], 
# ['white', '68', 'right'], 
# ['red', '75', 'right']] 

以及CSV文件的輸出:

!cat output_file.csv 
# Output: 
# red,75,right 
# green,3,center 
# yellow,3222,right 
# blue,9,center 
# black,123,left 
# white,68,right 
# purple,48,left 
# pink,2677,left 
+0

我原來的問題不是很清楚,我已經更新了它與預期的輸出 – fightstarr20

+0

這很好!我如何將結果輸出爲CSV? – fightstarr20

+0

我得到迭代器應該返回字符串,而不是字節 – fightstarr20

0

你可以試試這個:

import fileinput 

def main(): 
    seen = set() # set for fast O(1) amortized lookup 

    for line in fileinput.FileInput('1.csv', inplace=1): 
     cell_1 = line.split(',')[0] 
     if cell_1 not in seen: 
      seen.add(cell_1) 
      print line, # standard output is now redirected to the file 

if __name__ == '__main__': 
    main()