2017-09-01 27 views
0

我的工作與數據框採用這種結構大熊貓的Python中的重複數據幀,採取信息重複行的每個組合

id,date,id_client,optionin,optionout 
1,09/01/2017,123456,11,12 
2,09/01/2017,123456,12,14 
3,09/02/2017,1111111,85,45 
4,09/02/2017,1111111,45,35 
5,09/02/2017,1111111,35,58 
6,09/01/2017,528585,1,2 
7,09/01/2017,548123,37,12 
8,09/01/2017,123588,117,512 
9,09/01/2017,981358,116,152 

我想在同一天擺脫重複的條目在同客戶。 我只想第一optionin的數據,並在同一行中的最後一個optionout,並與optionout

像這樣

id,id_end,date,id_client,optionin,optionout 
1,2,09/01/2017,123456,11,14 
3,5,09/02/2017,1111111,85,58 
6,6,09/01/2017,528585,1,2 
7,7,09/01/2017,548123,37,12 
8,8,09/01/2017,123588,117,512 
9,9,09/01/2017,981358,116,152 

我如何能做到這一點的ID的新列?可能嗎?

回答

1

可以使用AGG()

df.groupby(['id_client', 'date']).agg({'optionin': 'first','optionout': 'last'}).reset_index() 

    id_client date  optionin optionout 
0 123456  09/01/2017 11   14 
1 123588  09/01/2017 117   512 
2 528585  09/01/2017 1   2 
3 548123  09/01/2017 37   12 
4 981358  09/01/2017 116   152 
5 1111111  09/02/2017 85   58 

對於ID部分,

df1 = df.groupby(['id_client', 'date']).agg({'optionin': 'first', 'optionout': 'last', 'id': ['first', 'last']}).reset_index() 
df1.columns = df1.columns.map('_'.join) 

    id_client_ date_  optionin_first optionout_last id_first id_last 
0 123456  09/01/2017 11    14    1   2 
1 123588  09/01/2017 117    512    8   8 
2 528585  09/01/2017 1    2    6   6 
3 548123  09/01/2017 37    12    7   7 
4 981358  09/01/2017 116    152    9   9 
5 1111111  09/02/2017 85    58    3   5 
+1

節省一個步驟:'df.groupby([ '日期', 'id_client'],as_index =假).agg({'optionin':'first','optionout':'last'})':) – Wen