如何根據兩個文件中的公共信息合併兩個CSV文件？

-4

Evans，非常感謝。這幾乎是預期的結果。請按照以下步驟進行修改。我們需要一點改變。請看看圖像。在當前結果中，fileOne中的每條記錄搜索fileTwo中類似的adv_id和user_id，並在查找記錄時查找並停止。但可能性是fileTwo中可能有幾個類似的記錄。所以，我們需要fileTwo中的所有類似記錄。 fileOne的所有記錄必須至少在fileTwo中可用一次或多次。因此，我們應該包含fileOne的所有記錄以及來自fileTwo的所有類似記錄。我認爲逐行搜索可能會有所幫助。這是fileOne的第一個文件的adv_id和user_id，並搜索fileTwo中的所有記錄以查找相似的記錄。接下來使用fileOne的第2條記錄並搜索fileTwo中的所有記錄。等等。如何根據兩個文件中的公共信息合併兩個CSV文件？

Revised Image For Expected Result

來源

2015-10-14 Tofazzal

您好。這是你所需要的非常好的陳述，但是我沒有看到你有任何證據表明你嘗試了一些東西。你是否會讓我們知道你在寫這篇文章時遇到了什麼問題，或者如果你還沒有這樣做，先放一下，然後在必要時修改這個問題來解釋你有什麼困難？ – halfer

可能的重複[如何兩個水平使用python合併幾個.csv文件？]（http://stackoverflow.com/questions/3986353/how-two-merge-several-csv-files-horizontally-with-python） – Jimilian

How 'conv_id'會影響合併嗎？在兩個文件中找到匹配的條目時，哪一個需要保留？ –

下面的腳本將根據您的原始樣本數據創建result.csv（見過去的編輯質疑）：

import csv 
from collections import defaultdict 

d_entries = defaultdict(list) 

with open('fileTwo.csv', 'r') as f_fileTwo: 
    csv_fileTwo = csv.reader(f_fileTwo) 
    header_fileTwo = next(csv_fileTwo) 
    for cols in csv_fileTwo: 
     d_entries[(cols[0], cols[1])].append([cols[0], ''] + cols[1:]) 

with open('fileOne.csv', 'r') as f_fileOne, open('result.csv', 'w', newline='') as f_result: 
    csv_fileOne = csv.reader(f_fileOne) 
    csv_result = csv.writer(f_result) 
    header_fileOne = next(csv_fileOne) 
    csv_result.writerow(header_fileOne) 

    for cols in csv_fileOne: 
     if (cols[0], cols[2]) in d_entries: 
      csv_result.writerow(cols) 
      csv_result.writerows(d_entries.pop((cols[0], cols[2])))

result.csv然後將含有當在Excel中打開以下數據：

使用Python測試3.4.3

要只在adv_id欄比賽，並有所有條目：

import csv 
from collections import defaultdict 

d_entries = defaultdict(list) 

with open('fileTwo.csv', 'r') as f_fileTwo: 
    csv_fileTwo = csv.reader(f_fileTwo) 
    header_fileTwo = next(csv_fileTwo) 
    for cols in csv_fileTwo: 
     d_entries[cols[0]].append([cols[0], ''] + cols[1:]) 

with open('fileOne.csv', 'r') as f_fileOne, open('result.csv', 'w', newline='') as f_result: 
    csv_fileOne = csv.reader(f_fileOne) 
    csv_result = csv.writer(f_result) 
    header_fileOne = next(csv_fileOne) 
    csv_result.writerow(header_fileOne) 

    for cols in csv_fileOne: 
     if cols[0] in d_entries: 
      csv_result.writerows(d_entries.pop(cols[0])) 
     csv_result.writerow(cols)

來源

2015-10-14 17:33:46

Hello Evans 很多謝謝 1. conv_id現在不應該影響合併過程。它可以在合併後用於排序。 all_cv文件包含已轉換用戶的信息，因此cvflg = 1。 non_cv文件包含未轉換用戶的信息，因此設置cvflg = 0. 保留兩個文件的所有條目，且不應重疊。發現運行時錯誤：文件「join_cv_noncv.py」，9號線，在 header_noncv =下一個（csv_noncv） _csv.Error：（？你以文本模式打開文件），迭代器應該返回字符串，而不是字節請相應查看並修改。問候， Tofa – Tofazzal

我猜你正在使用Python 3，它是爲Python 2編寫的。它現在應該在Python 3中工作。注意，如果輸出不是所需的，請編輯問題以包含兩個示例輸入文件和預期輸出。 –

當前版本僅用於Python 3。 –

如何根據兩個文件中的公共信息合併兩個CSV文件？

回答

相關問題