2016-07-26 170 views
0

我想合併兩個csv文件與一個共同的id列並將合併寫入一個新的文件。我曾嘗試以下,但它給我一個錯誤 -通過共同的列合併兩個CSV文件python

import csv 
from collections import OrderedDict 

filenames = "stops.csv", "stops2.csv" 
data = OrderedDict() 
fieldnames = [] 
for filename in filenames: 
    with open(filename, "rb") as fp: # python 2 
     reader = csv.DictReader(fp) 
     fieldnames.extend(reader.fieldnames) 
     for row in reader: 
      data.setdefault(row["stop_id"], {}).update(row) 

fieldnames = list(OrderedDict.fromkeys(fieldnames)) 
with open("merged.csv", "wb") as fp: 
    writer = csv.writer(fp) 
    writer.writerow(fieldnames) 
    for row in data.itervalues(): 
     writer.writerow([row.get(field, '') for field in fieldnames]) 

兩個文件有「stop_id」一欄,但我發現這個錯誤回來 - KeyError異常:「stop_id」

任何幫助非常感謝。

由於

+0

'data.setdefault(row [「stop_id」],{})。update(row)' - 爲什麼這麼複雜? – Alleo

+0

另外,按列合併兩個表是用'pandas.merge'完成的,請參閱http://pandas.pydata.org/pandas-docs/stable/merging.html#brief-primer-on-merge-methods-relational - 代數 – Alleo

+0

我用另一個堆棧溢出示例作爲輸入。你能提出一個替代方案嗎?謝謝 – sgpbyrne

回答

0

由於四條的例子。

這是什麼爲我合併後的每個csv中的第一列合併。

import csv 
from collections import OrderedDict 

with open('stops.csv', 'rb') as f: 
    r = csv.reader(f) 
    dict2 = {row[0]: row[1:] for row in r} 

with open('stops2.csv', 'rb') as f: 
    r = csv.reader(f) 
    dict1 = OrderedDict((row[0], row[1:]) for row in r) 

result = OrderedDict() 
for d in (dict1, dict2): 
    for key, value in d.iteritems(): 
     result.setdefault(key, []).extend(value) 

with open('ab_combined.csv', 'wb') as f: 
    w = csv.writer(f) 
    for key, value in result.iteritems(): 
     w.writerow([key] + value) 
1

下面是使用大熊貓

import sys 
from StringIO import StringIO 
import pandas as pd 

TESTDATA=StringIO("""DOB;First;Last 
    2016-07-26;John;smith 
    2016-07-27;Mathew;George 
    2016-07-28;Aryan;Singh 
    2016-07-29;Ella;Gayau 
    """) 

list1 = pd.read_csv(TESTDATA, sep=";") 

TESTDATA=StringIO("""Date of Birth;Patient First Name;Patient Last Name 
    2016-07-26;John;smith 
    2016-07-27;Mathew;XXX 
    2016-07-28;Aryan;Singh 
    2016-07-20;Ella;Gayau 
    """) 


list2 = pd.read_csv(TESTDATA, sep=";") 

print list2 
print list1 

common = pd.merge(list1, list2, how='left', left_on=['Last', 'First', 'DOB'], right_on=['Patient Last Name', 'Patient First Name', 'Date of Birth']).dropna() 
print common