熊貓 - dataframe groupby - 如何獲得多列總和

這應該是一個容易的，但不知何故，我找不到一個有效的解決方案。熊貓 - dataframe groupby - 如何獲得多列總和

我有一個熊貓數據幀，看起來像這樣：

index col1 col2 col3 col4 col5 
0  a  c  1  2  f 
1  a  c  1  2  f 
2  a  d  1  2  f 
3  b  d  1  2  g 
4  b  e  1  2  g 
5  b  e  1  2  g

我想GROUP BY col1和col2上，並得到COL3和COL4的sum()。Col5可以被丟棄，因爲數據不能被聚合。

下面是輸出結果的樣子。我有興趣在結果數據框中同時使用col3和col4。 col1和col2是否是索引的一部分並不重要。

index col1 col2 col3 col4 
0  a  c  2  4   
1  a  d  1  2  
2  b  d  1  2  
3  b  e  2  4

這裏是我的嘗試：

df_new = df.groupby(['col1', 'col2'])["col3", "col4"].sum()

這不過只是返回col4的彙總結果。

我迷失在這裏。我找到的每個示例都只彙總了一列，問題顯然不會發生。

來源

2017-09-26 Axel

問題可能是'df.col3.dtype'很可能不是一個'int'或數字數據類型。在做'groupby'之前嘗試'df.col3 = df.col3.astype（int）' –

通過使用apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum()) 
Out[1257]: 
      col3 col4 
col1 col2    
a c  2  4 
    d  1  2 
b d  1  2 
    e  2  4

是要agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

來源

2017-09-26 16:14:53 Wen

的問題很可能是df.col3.dtype很可能不是一個int或數字數據類型。嘗試df.col3 = df.col3.astype(int)做你groupby

之前此外，選擇列GROUPBY後，看是否列甚至被彙總：

df_new = df.groupby(['col1', 'col2']).sum()[["col3", "col4"]]

來源

2017-09-26 16:17:45

熊貓 - dataframe groupby - 如何獲得多列總和

回答

相關問題