2016-11-23 117 views
3
df = pd.DataFrame({ 
    'A': ['d','d','d','f','f','f','g','g','g','h','h','h'], 
    'B': [5,5,6,7,5,6,6,7,7,6,7,7], 
    'C': [1,1,1,1,1,1,1,1,1,1,1,1], 
    'S': [2012,2013,2014,2015,2016,2012,2013,2014,2015,2016,2012,2013]  
    }); 

df = (df.B + df.C).groupby([df.A,df.S]).agg(['sum','size']). 
     unstack(fill_value=0) 
df10 = (df.B * df.C).groupby([df.A,df.S]).agg(['sum','size']). 
     unstack(fill_value=0) 
df20 = (df.B - df.C).groupby([df.A,df.S]).agg(['sum','size']). 
     unstack(fill_value=0) 

我可以一次運行以下代碼:df,df10,df20嗎?順便說一下,在真實數據中,我將使用與以下相同的代碼運行80個數據幀;是否可以一次創建多個數據框?

df1 = df.groupby(level=0, axis=1).sum() 
new_cols= list(zip(df1.columns.get_level_values(0),['total'] *  len(df.columns))) 
df1.columns = pd.MultiIndex.from_tuples(new_cols) 
df2 = pd.concat([df1,df], axis=1).sort_index(axis=1).sort_index(axis=1, level=1) 
df2.columns = ['_'.join((col[0], str(col[1]))) for col in df2.columns] 

回答

1
b_c_idx_locs = [df.columns.get_loc('B'), df.columns.get_loc('C')] 

a = df.values[:, b_c_idx_locs] 

df['B+C'] = a.sum(1) 
df['B*C'] = a.prod(1) 
df['B-C'] = -np.diff(a) 
cols = ['B+C', 'B*C', 'B-C'] 

df.groupby(['A', 'S'])[cols].agg(['sum', 'size']) 

enter image description here

+0

感謝。有沒有可能有多年的專欄?另外,如何修改附加代碼以返回總和和大小? – Zanshin

+0

'reset_index('S')'得到多年的專欄。將結果賦給'df_'然後'df_.append(df_.sum()。rename(('Total','')))'' – piRSquared

相關問題