忽略CSV上的重複行

我正在嘗試讀取CSV文件並將其中的行寫入另一個csv文件。我的輸入文件有重複的行。在輸出中，我只需要單行。從我的示例腳本中，您可以看到我創建了一個名爲「讀者」的列表。該列表獲得了輸入csv的所有行。然後在for循環中，我使用writer.writerow（讀者[1] + ....），它基本上讀取標題後面的第一行。但問題是這第一行是重複的。我如何調整我的腳本，使它只執行一次？忽略CSV上的重複行

for path in glob.glob("out.csv"): 
    if path == "out1.csv": continue 
    with open(path) as fh: 
     readers = list(csv.reader(fh)) 

     for row in readers: 

      if row[8] == 'READ' and row[10] == '1110': 

       writer.writerow(readers[1] + [] + [row[2]]) 
      elif row[8] == 'READ' and row[10] == '1011': 
       writer.writerow(readers[1] + [] + [" "] + [" "] + [" "] + [row[2]]) 
      elif row[8] == 'READ' and row[10] != ('1101', '0111'): 
       writer.writerow(readers[1] + [] + [" "] + [row[2]])

採樣輸入

ID No. Name Value RESULTS 
     28 Jason 56789 Fail 
     28 Jason 56789 Fail 
     28 Jason 56789 Fail 
     28 Jason 56789 Fail

來源

2017-08-14 Muscles

行是否已排序（即，我們可以預計重複出現在另一個旁邊嗎？或者腳本是否也需要這樣做？ – Dan

對不起，請您詳細說明一下，您在這裏排序的意思是什麼？我想在我的腳本中進行更改，以便只寫一次相同的行。目前它重複相同的行。 – Muscles

您可以使用set類型刪除重複

readers_unique = list(set(readers))

來源

2017-08-14 15:04:13

您可以使用熊貓包。這將是這樣的：

import pandas as pd 
# Read the file (considering header by default) and save in variable: 
table = pd.read_csv() 
# Drop the duplicates: 
clean_table = table.drop_duplicates() 
# Save clean data: 
clean_table.to_csv("data_without_duplicates.csv")

您可以檢查引用here，並here

來源

2017-08-14 15:25:07 RZRKAL

雖然上述答案是基本上是正確的，用熊貓的，這似乎有點小題大做了我。只需使用列表中包含您在處理過程中已經看到的ID列值（假設ID列獲得其名稱，否則您必須使用組合鍵）。然後檢查你是否已經看到了這個值和「presto」：

ID_COL = 1 
id_seen = [] 
for path in glob.glob("out.csv"): 
    if path == "out1.csv": continue 
    with open(path) as fh: 
     for row in csv.reader(fh): 
      if row[ID_COL] not in id_seen: 
       id_seen.append(row[ID_COL]) 
       # write out whatever column you have to 
       writer.writerow(readers[1] + [] + [row[2]])

來源

2017-08-19 15:52:59 Arminius

忽略CSV上的重複行

回答

相關問題