熊貓：我如何在熊貓數據框中聚合*列中的* *

我想在一定條件下將熊貓數據框中的列聚合爲一個。這個想法是在DF中節省空間，並將某些列聚合成一個，只要它們回答了一定的條件。一個例子可能會更容易解釋：熊貓：我如何在熊貓數據框中聚合*列中的* *

import pandas as pd 
import seaborn as sns  # for sample data set 

# load some sample data 
titanic = sns.load_dataset('titanic') 

# round the age to an integer for convenience 
titanic['age_round'] = titanic['age'].round(0) 

# crosstabulate 
crtb = pd.crosstab(titanic['embark_town'], titanic['age_round'], margins=True) 
crtb

產量：

我想要做什麼，例如，是聚集所有的都是> = 20列（例如）添加到名爲'20 +'的一列中，並且這些值將是聚合列的每行所有值的總和。當列標題爲< 20時，它們將保持分離且不變。解決這個問題的一種方法是在原始DF中創建另一個列，如果這個列的原始值是< 20和'20 +'else，或者使用.cut並在其上旋轉，則會給出age的原始值。

想知道是否有辦法更聰明地做到這一點，而不需要創建新列。謝謝！

來源

2016-05-10 Optimesh

對於這個具體的例子，我不認爲你需要添加一列，只需更新該列中的值，你已經有：

import pandas as pd 
import seaborn as sns  # for sample data set 

# load some sample data 
titanic = sns.load_dataset('titanic') 

# round the age to an integer for convenience 
titanic['age_round'] = titanic['age'].round(0) 
titanic.loc[titanic['age_round']>=20, 'age_round'] = '20+' 

# crosstabulate 
crtb = pd.crosstab(titanic['embark_town'], titanic['age_round'], margins=True)

是你的問題如何在一般做到這一點？在熊貓中有多種不同的數據聚合方式，最爲標準的是使用.groupby（）結構。交叉表基本上是這兩個變量分組的快捷方式，然後調用.unstack（）。

來源

2016-05-10 18:10:36 flyingmeatball

感謝您的回覆。是的，我正在尋找一個通用的解決方案。您建議的解決方案與我的類似，因爲它修改了實際的列值。我希望有一個解決方案可以應用於列標題（隨後會將聚合函數應用於實際值）。 – Optimesh

這樣做沒什麼意義 - 你的價值觀就是你的列標題。如果你等到他們是標題，你只需要做更多的工作。 – flyingmeatball

熊貓：我如何在熊貓數據框中聚合*列中的* *

回答

相關問題

熊貓：我如何在熊貓數據框中聚合列中的 *