2017-10-20 57 views
1

有類似數據集。獲取按日期分組的相同列按2個條件過濾

 TransactionId UserId transaction_date transaction_status amount_USD 
0  3996625673 1298122  2015-08-11   CHARGED  10,96 
1  5797849338 1125916  2015-08-11   DECLINED  14,7 
2  9535361884 8009005  2015-08-11   CHARGED  10,61 
3  8410989235 1123856  2015-07-29   DECLINED  10,96 

需要去在TRANSACTION_DATE的情況下,通過柱amount_usd總和, transaction_status

 
transaction_date CHARGED DECLINED 
2015-07-29    0  10,96 
2015-08-11   21,57 14,7 

試圖在這樣做這樣

 
df[df['transaction_status']=='DECLINED']['amount_USD'].groupby('transaction_date').sum() 

回答

3

使用replace的數字,然後再groupby用匯總sum,然後通過unstack重塑:

#or use parameter decimal=',' to read_csv 
df['amount_USD'] = df['amount_USD'].replace(',','.', regex=True).astype(float) 

df = df.groupby(['transaction_date','transaction_status'])['amount_USD'] 
     .sum() 
     .unstack(fill_value=0) 
print (df) 
transaction_status CHARGED DECLINED 
transaction_date      
2015-07-29    0.00  10.96 
2015-08-11   21.57  14.70 

替代與pivot_table,感謝Bharath shetty

df = df.pivot_table(index='transaction_date', 
        columns='transaction_status', 
        values='amount_USD', 
        aggfunc='sum', 
        fill_value=0) 
print (df) 

transaction_status CHARGED DECLINED 
transaction_date      
2015-07-29    0.00  10.96 
2015-08-11   21.57  14.70 

末列從索引的使用reset_indexrename_axis

df = df.reset_index().rename_axis(None, axis=1) 
print (df) 
    transaction_date CHARGED DECLINED 
0  2015-07-29  0.00  10.96 
1  2015-08-11 21.57  14.70 
+0

您還可以使用數據透視表將您添加 – Dark

+0

'DF。 pivot_table(指數= [ 'TRANSACTION_DATE'],列= [ 'transaction_status'],值= 'amount_USD',aggfunc = '總和')。fillna(0)' – Dark