2011-03-18 68 views
0

我有兩個csv文件 - 一個主文件&。我想要從更新文件中獲取特定列,&檢查主對象的值。在python中搜索和比較csv文件中的valuse

兩個文件將有應該&大致是這樣的同列:

Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd) 
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31, 
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01, 
C.P. Lotus Corporation,________,00121,Current,CHEARAVANONT Soopakij,___,Executive Director,CEO,2000-04-14, 

基本上,我想遍歷更新文件,從更新文件&檢查,看它是否以每隻股票代碼值存在於主文件中。

然後,對於每個匹配的股票代碼,我需要檢查Director名稱值的差異,並跟蹤那些不匹配的值。

我已經按照這個例子,但它似乎並沒有這樣做完全是我需要(或者我不完全瞭解它...):Python: Comparing two CSV files and searching for similar items

f1 = file(csvHKX, 'rU') 
f2 = file(csvWRHK, 'rU') 
f3 = file('results.csv', 'w') 

csv1 = csv.reader(f1) 
csv2 = csv.reader(f2) 
csv3 = csv.writer(f3) 

scode = [row for row in csv2] 

for hkx_row in csv1: 
    for wrhk_row in scode: 
    if hkx_row[2] != wrhk_row[2]: 
     print 'HKX:', hkx_row 
    continue 

f1.close() 
f2.close() 
f3.close() 

更新文件包含以下股票代碼:'00121'&'01003'(用於測試)。

看起來好像代碼遍歷列表比較每行&打印出一行,如果股票代碼不符合行的線。所以當第一列讀取'00121'時,它會打印出包含'01003'&的行。

但我只當它不能找到hkx_row [2]隨時隨地wrhk_row [2]

+0

如果他們不同,你需要怎麼做?更新主人? – theheadofabroom 2011-03-18 10:44:58

+1

如果您在鏈接的示例中指定缺少/不符合您的喜好,我們可能會更容易回答。 「 – atzz 2011-03-18 11:07:03

+1

」但它似乎沒有做到我需要的東西「?爲什麼不?你真正遇到的實際問題是什麼?請發佈**你的**代碼和**你的**錯誤或問題。 – 2011-03-18 11:12:43

回答

0

做這個幫助你有興趣嗎? :

文件master.csv

Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd) 
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31, 
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01, 
C.P. Lotus Corporation,________,00121,Current,CHEARAVANONT Soopakij,___,Executive Director,CEO,2000-04-14, 
C.P. Lotus Corporation,________,00123,Current,DEANINO James,___,Pilot,,2009-06-25, 
C.P. Lotus Corporation,________,00129,Current,GINGE Ivy,___,Dental Technician,,2010-07-27, 
C.P. Lotus Corporation,________,00127,Current,ERATOR Jane,___,Engineer,,2005-12-04, 
C.P. Lotus Corporation,________,00119,Current,FIELD Mary,___,Pastrycook,,2009-06-25, 

文件update.csv

Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd) 
C.P. Lotus Corporation,________,00133,Current,THOMPSON Sarah,___,Cosmonaut,,2004-01-20, 
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31, 
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01, 
C.P. Lotus Corporation,________,00121,Current,BEARD Sophia,___,Executive Director,CEO,2010-04-26, 
C.P. Lotus Corporation,________,00127,Current,ERATOR Jane,___,Engineer,,2005-12-04, 
C.P. Lotus Corporation,________,00129,Current,MISTOUKI Hassan,___,Folk Singer,,2010-07-27, 

代碼

import csv 

mas = csv.reader(open('master.csv','rb')) 
upd = csv.reader(open('update.csv','rb')) 

set24 = set((row[2],row[4]) for row in mas) 
print set24 
print 

updkept = [ row for row in upd if (row[2],row[4]) not in set24] 
print '\n'.join(map(str,updkept)) 

結果

set([('00127', 'ERATOR Jane'), ('00121', 'CHEARAVANONT Soopakij'), ('00121', 'CHEARAVANON Narong'), ('00119', 'FIELD Mary'), ('00122', 'CHEARAVANONT Dhanin'), ('Stock Code', "Director's English Name"), ('00129', 'GINGE Ivy'), ('00123', 'DEANINO James')]) 

['C.P. Lotus Corporation', '________', '00133', 'Current', 'THOMPSON Sarah', '___', 'Cosmonaut', '', '2004-01-20', ''] 
['C.P. Lotus Corporation', '________', '00121', 'Current', 'BEARD Sophia', '___', 'Executive Director', 'CEO', '2010-04-26', ' '] 
['C.P. Lotus Corporation', '________', '00129', 'Current', 'MISTOUKI Hassan', '___', 'Folk Singer', '', '2010-07-27', ''] 
+0

是。這似乎是伎倆!不熟悉set(這裏的總數n00b ...)。謝謝! – Nathan 2011-03-21 07:31:43

+0

@Nathan''li24 = [(row [2],row [4])for mas]''中的行也是可能的,但一個集合更好,因爲它的元素保留爲哈希,這樣可以更快地搜索,就像據我所知(我認爲我的措辭是不好的英語,請原諒和糾正我) – eyquem 2011-03-21 09:04:08