Filter-Out重複表條目

我想在T1中讀取並將它寫爲T2（注意兩者都是.csv）。 T1包含重複的行;我不想在T2中重複寫入。Filter-Out重複表條目

+------+------+---------+---------+---------+ 
| Type | Year | Value 1 | Value 2 | Value 3 | 
+------+------+---------+---------+---------+ 
| a | 8 | x  | y  | z  | 
| b | 10 | q  | r  | s  | 
+------+------+---------+---------+---------+

+------+------+---------+-------+ 
| Type | Year | Value # | Value | 
+------+------+---------+-------+ 
| a | 8 | 1  | x  | 
| a | 8 | 2  | y  | 
| a | 8 | 3  | z  | 
| b | 10 | 1  | q  | 
| ... | ... | ...  | ... | 
+------+------+---------+-------+

目前，我有這樣的速度奇慢的代碼過濾掉重複：

no_dupes = [] 

for row in reader: 
    type = row[0] 
    year = row[1] 
    index = type,age 
    values_list = row[2:] 

    if index not in no_dupes: 
     for i,j in enumerate(values_list): 
      line = [type, year, str(i+1), str(j)] 
      writer.writerow(line) #using csv module 
      no_dupes.append(index)

我不能exagerate這個代碼是如何緩慢時，T1變大。

當我寫入T2時，是否有更快的方式從T1中過濾出重複項？

來源

2013-04-09 ABM

在最低限度，要添加每次在循環中將'index'指向'no_dupes'列表。因此：（1）將'no_dupes'改爲'set'並且（2）在每個循環中僅將'index'添加到'no-dupes'一次。 – hughdbrown 2013-04-09 19:45:12

我想你想是這樣的：

no_dupes = set() 

for row in reader: 
    type, year = row[0], row[1] 
    values_list = row[2:] 

    for index, value in enumerate(values_list, start=1): 
     line = (type, year, index, value) 
     no_dupes.add(line) 

for t in no_dupes: 
    writer.writerow(t)

來源

2013-04-09 19:46:01 hughdbrown

謝謝！這實質上更快。 – ABM 2013-04-09 20:14:49

如果可能的話轉換讀者一組和迭代設置相反，那麼就沒有了DUP可能性

來源

2013-04-09 19:26:56 ennuikiller

我不能使用csv.reader（）;閱讀器中的每一行都是一個列表。 – ABM 2013-04-09 19:40:28

Filter-Out重複表條目

回答

相關問題