2017-03-05 73 views
1

我有一個看起來像這樣的數據:的Python for循環和數據

的列有姓名,ID,機號,日期

('Anthony', '1', '10', '4/3/2017') 
('Anthony', '1', '11', '5/2/2017') 
('Anthony', '1', '13', '12/30/2017 
('Anthony', '1', '15', '8/20/2017' 
('Anthony', '4', '17', '2/3/2018') 
('Anthony', '4', '18', '3/28/2017' 
('Bob', '1', '111', '4/3/2017') 
('Bob', '1', '200', '5/2/2017') 
('Bob', '1', '113', '12/30/2017') 
('Bob', '1', '115', '8/20/2017') 
('Bob', '4', '117', '2/3/2018') 
('Bob', '4', '118', '3/28/2017') 

我試圖找到獨特的名和ID的,然後比較任何日期並只返回將來最遠的日期。

理想我想輸出,看起來像:

('Anthony', '1', '12/30/2017') 
('Anthony', '4', '2/3/2018') 
('Bob', '1', '12/30/2017') 
('Bob', '4', '2/3/2018') 

我掙扎,因爲我有多個按鍵,我無法弄清楚如何使它發揮作用。有任何想法嗎?

編輯:這只是一個樣本,我有30個人的姓名和10個唯一的ID。所以我正在尋找一個For循環來解決這個問題。

回答

0

您可以使用itertools.groupby結合max來獲得與您正在查找的內容類似的輸出。

import itertools 
from datetime import datetime 

data = [('Anthony', '1', '10', '4/3/2017'), 
     ('Anthony', '1', '11', '5/2/2017'), 
     ('Anthony', '1', '13', '12/30/2017'), 
     ('Anthony', '1', '15', '8/20/2017'), 
     ('Anthony', '4', '17', '2/3/2018'), 
     ('Anthony', '4', '18', '3/28/2017'), 
     ('Bob', '1', '111', '4/3/2017'), 
     ('Bob', '1', '200', '5/2/2017'), 
     ('Bob', '1', '113', '12/30/2017'), 
     ('Bob', '1', '115', '8/20/2017'), 
     ('Bob', '4', '117', '2/3/2018'), 
     ('Bob', '4', '118', '3/28/2017')] 

groups_with_max_date = [] 
for key, group in itertools.groupby(data, lambda d: (d[0], d[1])): 
    # convert to datetime and get max of group 
    group_max = max(group, key=lambda q: datetime.strptime(q[3], '%m/%d/%Y')) 
    groups_with_max_date.append(group_max) 

groups_with_max_date 

得到:

[('Anthony', '1', '13', '12/30/2017'), 
('Anthony', '4', '17', '2/3/2018'), 
('Bob', '1', '113', '12/30/2017'), 
('Bob', '4', '117', '2/3/2018')] 
+1

氏如果列按「名稱」和「Id」排序,則只能使用 –

+0

您可以使用['''operator.itemgetter(0,1)'''](https://docs.python.org/3/ library/operator.html#operator.itemgetter)作爲對數據進行排序和分組的關鍵。 – wwii

0
使用

對象datetimedict.setdefault()maxdatetime.strptime函數的溶液:

import datetime 

l = [('Anthony', '1', '10', '4/3/2017'),('Anthony', '1', '11', '5/2/2017'),('Anthony', '1', '13', '12/30/2017'),('Anthony', '1', '15', '8/20/2017'), 
('Anthony', '4', '17', '2/3/2018'),('Anthony', '4', '18', '3/28/2017'),('Bob', '1', '111', '4/3/2017'),('Bob', '1', '200', '5/2/2017'), 
('Bob', '1', '113', '12/30/2017'),('Bob', '1', '115', '8/20/2017'),('Bob', '4', '117', '2/3/2018'),('Bob', '4', '118', '3/28/2017')] 

d = {} 
for t in l: 
    # grouping items by first two values of each tuple(accumulating `date` strings) 
    d.setdefault(t[0] +'-'+ t[1], []).append(t[3]) # first two values of a tuple are combined to be a "hash" key 

# getting max date from the list of `datetime` objects 
result = [(*k.split('-'), max(v, key=lambda dt: datetime.datetime.strptime(dt, '%m/%d/%Y'))) for k,v in sorted(d.items())] 

print(result) 

輸出:

[('Anthony', '1', '12/30/2017'), ('Anthony', '4', '2/3/2018'), ('Bob', '1', '12/30/2017'), ('Bob', '4', '2/3/2018')]