2014-10-12 103 views
0

輸入數據集基於使用熊貓

Var1  Var2 Var3 Var4 

101 XXX  yyyy 12/10/2014 

101 XYZ  YTRT 13/10/2014 

102 TTY  UUUU 9/9/2014 

102 YTY  IUYY 10/10/2014 

輸出數據集預計關鍵變量轉換多行單行:

Var1 Var2  Var3   Var4 

101 XXX,XYZ yyyy,YTRI  12/10/2014, 13/10/2014 

102 TTY,YTY UUUU,IUYY  9/9/2014, 10/10/2014 

怎麼能期望的數據集,通過大熊貓編程來實現?

回答

1

一種方法是:

import pandas as pd 

data = {'Var1': {0: 101, 1: 101, 2: 102, 3: 102}, 
'Var2': {0: 'XXX', 1: 'XYZ', 2: 'TTY', 3: 'YTY'}, 
'Var3': {0: 'yyyy', 1: 'YTRT', 2: 'UUUU', 3: 'IUYY'}, 
'Var4': {0: '12/10/2014', 1: '13/10/2014', 2: '9/9/2014', 3: '10/10/2014'}} 

df = pd.DataFrame(data) 
df.set_index('Var1', inplace=True) 
print df 

    Var2 Var3  Var4 
Var1      
101 XXX yyyy 12/10/2014 
101 XYZ YTRT 13/10/2014 
102 TTY UUUU 9/9/2014 
102 YTY IUYY 10/10/2014 

f = lambda x: ','.join(x) 
print df.groupby(level='Var1', as_index=True).transform(f).drop_duplicates().reset_index() 

    Var1  Var2  Var3     Var4 
0 101 XXX,XYZ yyyy,YTRT 12/10/2014,13/10/2014 
1 102 TTY,YTY UUUU,IUYY 9/9/2014,10/10/2014