我想合併兩個CSV文件並保留重複記錄。每個文件可能有匹配的記錄,一個文件可能有重複的記錄(具有不同測試分數的同一個學生ID),並且一個文件可能在另一個文件中沒有匹配的學生記錄。下面的代碼按預期方式工作,但如果存在重複記錄,則只有第二條記錄正在寫入合併文件。我查看了很多線程和所有地址刪除重複,但我需要保留重複的記錄。合併兩個CSV文件保留重複
import csv
from collections import OrderedDict
import os
cd = os.path.dirname(os.path.abspath(__file__))
fafile = os.path.join(cd, 'MJB_FAScores.csv')
testscores = os.path.join(cd, 'MJB_TestScores.csv')
filenames = fafile, testscores
data = OrderedDict()
fieldnames = []
for filename in filenames:
with open(filename, 'r') as fp:
reader = csv.DictReader(fp)
fieldnames.extend(reader.fieldnames)
for row in reader:
data.setdefault(row['Student_Number'], {}).update(row)
fieldnames = list(OrderedDict.fromkeys(fieldnames))
with open('merged.csv', 'w', newline='') as fp:
writer = csv.writer(fp)
writer.writerow(fieldnames)
for row in data.values():
writer.writerow([row.get(fields, '') for fields in fieldnames])
fafile:
Student_Number Name Grade Teacher FA1 FA2 FA3
65731 Ball, Peggy 4 Bauman, Edyn 56 45 98
65731 Ball, Peggy 4 Bauman, Edyn 32 323 232
85250 Ball, Jonathan 3 Clarke, Mary 65 77 45
981235 Ball, David 5 Longo, Noel 56 89 23
91851 Ball, Jeff 0 Delaney, Mary 83 45 42
543 MAX 2 Phil 77 77 77
543 MAX 2 Annie 88 888 88
9844 Lisa 1 Smith, Jennifer 43 44 55
testscores:
Student_Number Name Grade Teacher MAP Reading MAP Math FP Level DSA LN DSA WW DSA SJ DSA DC
65731 Ball, Peggy 4 Bauman, Edyn 175 221 A 54 23 78 99
72941 Ball, Amanda 4 Bauman, Edyn 201 235 J 65 34 65
85250 Ball, Jonathan 3 Clarke, Mary 189 201 L 34 54
981235 Ball, David 5 Longo, Noel 225 231 D 23 55
91851 Ball, Jeff 0 Delaney, Mary 198 175 C
65731 Ball, Peggy 4 Bauman, Edyn 200 76 Y 54 23 78 99
543 MAX 2 Phil 111 111 Z 33 44 55 66
543 MAX 2 Annie 222 222 A 44 55 66 77
電流輸出:
Student_Number Name Grade Teacher FA1 FA2 FA3 MAP Reading MAP Math FP Level DSA LN DSA WW DSA SJ DSA DC
65731 Ball, Peggy 4 Bauman, Edyn 32 323 232 200 76 Y 54 23 78 99
85250 Ball, Jonathan 3 Clarke, Mary 65 77 45 189 201 L 34 54
981235 Ball, David 5 Longo, Noel 56 89 23 225 231 D 23 55
91851 Ball, Jeff 0 Delaney, Mary 83 45 42 198 175 C
543 MAX 2 Annie 88 888 88 222 222 A 44 55 66 77
72941 Ball, Amanda 4 Bauman, Edyn 201 235 J 65 34 65
9844 Lisa 1 Smith, Jennifer 43 44 55
所需的輸出:
Student_Number Name Grade Teacher FA1 FA2 FA3 MAP Reading MAP Math FP Level DSA LN DSA WW DSA SJ DSA DC
65731 Ball, Peggy 4 Bauman, Edyn 32 323 232 200 76 Y 54 23 78 99
65731 Ball, Peggy 4 Bauman, Edyn 56 45 98 175 221 A 54 23 78 99
85250 Ball, Jonathan 3 Clarke, Mary 65 77 45 189 201 L 34 54
981235 Ball, David 5 Longo, Noel 56 89 23 225 231 D 23 55
91851 Ball, Jeff 0 Delaney, Mary 83 45 42 198 175 C
543 MAX 2 Annie 88 888 88 222 222 A 44 55 66 77
543 MAX 2 Phil 77 77 77 111 111 Z 33 44 55 66
72941 Ball, Amanda 4 Bauman, Edyn 201 235 J 65 34 65
9844 Lisa 1 Smith, Jennifer 43 44 55
您的代碼產生'KeyError異常: 'Student_Number''就行'data.setdefault(行[' Student_Number'],{})更新(行)'。 – martineau
請說明您希望如何進行合併。例如,每個文件中有兩個記錄,學生ID爲65731,總共有四條記錄,但在所需的輸出中只有兩條記錄。爲什麼不是所有的四個都保存着 – martineau
基本上,來自testscores文件的數據應該水平地附加到Student_Number上的fafile匹配。每個文件都包含不同的數據集,並嘗試將每個學生的所有考試分數放在一起。有些學生會參加兩次考試,或者根本沒有考試,所以每個檔案中可能會有0個,1個或更多的記錄。最後的文件應該有這些列:Student_Number,Name,Grade,Teacher,FA1,FA2,FA3,MAP Reading,Map Math,FP Level,DSA LN,DSA WW,DSA SJ,DSA DC。如果學生在fafile中沒有記錄,但在測試中有2個記錄,他們應該有2行數據,其中「FAx」字段爲空 – PBall