2017-08-02 74 views
1

如何通過數據框將「Sum」列添加到熊貓組? 我想對下面groupby數據框的'Bearish'和'Bullish'內欄做一個'Sum'。將列添加到groupby數據框

然後我想補充的另外兩列:

%看跌=看跌/總和* 100

%看漲=看漲/總和* 100

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count() 
group_df = group_df.unstack() 

        message   
sentiment   Bearish Bullish 
created        
2017-08-01 23:00:00  2.0  2.0 
2017-08-02 00:00:00  1.0  3.0 
2017-08-02 01:00:00  NaN  4.0 
+1

你能爲你的原始數據框提供的頭上? –

回答

1

您可以使用concat與新DataFrame

idx = pd.date_range('2017-08-01 23:13:00', periods=12, freq='12T') 
df = pd.DataFrame({'message':[1,1,2,2,2,2,2,2,3,3,3,3], 
        'sentiment':['Bearish'] * 5 + ['Bullish'] * 7 }, index=idx) 
print (df) 
        message sentiment 
2017-08-01 23:13:00  1 Bearish 
2017-08-01 23:25:00  1 Bearish 
2017-08-01 23:37:00  2 Bearish 
2017-08-01 23:49:00  2 Bearish 
2017-08-02 00:01:00  2 Bearish 
2017-08-02 00:13:00  2 Bullish 
2017-08-02 00:25:00  2 Bullish 
2017-08-02 00:37:00  2 Bullish 
2017-08-02 00:49:00  3 Bullish 
2017-08-02 01:01:00  3 Bullish 
2017-08-02 01:13:00  3 Bullish 
2017-08-02 01:25:00  3 Bullish 

group_df =df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count() 
#add ['message'] for remove Multiindex in columns 
group_df = group_df['message'].unstack() 

#divide by sum 
#add prefix - https://stackoverflow.com/q/45453508/2901002 
df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%') 
print (df1) 
        %Bearish %Bullish 
2017-08-01 23:00:00  80.0  NaN 
2017-08-02 00:00:00  20.0 57.142857 
2017-08-02 01:00:00  NaN 42.857143 

df = pd.concat([group_df, df1], axis=1) 
print (df) 
        Bearish Bullish %Bearish %Bullish 
2017-08-01 23:00:00  4.0  NaN  80.0  NaN 
2017-08-02 00:00:00  1.0  4.0  20.0 57.142857 
2017-08-02 01:00:00  NaN  3.0  NaN 42.857143 

如果需要GroupBy.size

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).size() 
group_df = group_df.unstack() 

df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%') 
print (df1) 
        %Bearish %Bullish 
2017-08-01 23:00:00  80.0  NaN 
2017-08-02 00:00:00  20.0 57.142857 
2017-08-02 01:00:00  NaN 42.857143 

df = pd.concat([group_df, df1], axis=1) 
print (df) 
        Bearish Bullish %Bearish %Bullish 
2017-08-01 23:00:00  4.0  NaN  80.0  NaN 
2017-08-02 00:00:00  1.0  4.0  20.0 57.142857 
2017-08-02 01:00:00  NaN  3.0  NaN 42.857143 

What is the difference between size and count in pandas?