2017-08-10 47 views
2

我有一些花的列從第1周到52 我期待分別總結第26和最後26。Pandas - Sum系列從1-N列

我有以下幾點:

column_names = [x for x in df.columns.values.tolist() 
       if x.startswith("spend_") 
       ] 

這給了我所有我感興趣的列

[ 'spend_1', 'spend_2', 'spend_3', 「spend_4 」, 'spend_5' ...]

我可以再總結起來如下:

df['pre_spend'] = df[column_names].sum(axis=1) 

這給了我52周的時間。

有沒有簡單的方法來選擇1_26和27_52並分別求和?

在sas中,我會這樣做: pre_spend = sum(of spend_1-spend_26);

+0

你能製作一個樣本數據集嗎? – Travis

+0

您可以使用[列的切片](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#indexing-selection)對其中的部分進行求和。你應該花時間觀看這個[熊貓從頭開始](http://pandas.pydata.org/talks.html#pycon-us-2015)談話。 – wwii

回答

2

我覺得你需要的標籤爲DataFrame.loc選擇列:

a = df.loc[:, 'spend_1':'spend_26'].sum(axis=1) 

b = df.loc[:, 'spend_27':'spend_52'].sum(axis=1) 

樣品:

np.random.seed(100) 
df = pd.DataFrame(np.random.randint(10, size=(5,6))).add_prefix('spend_') 
print (df) 
    spend_0 spend_1 spend_2 spend_3 spend_4 spend_5 
0  8  8  3  7  7  0 
1  4  2  5  2  2  2 
2  1  0  8  4  0  9 
3  6  2  4  1  5  3 
4  4  4  3  7  1  1 

print (df.loc[:, 'spend_0':'spend_2']) 
    spend_0 spend_1 spend_2 
0  8  8  3 
1  4  2  5 
2  1  0  8 
3  6  2  4 
4  4  4  3 

a = df.loc[:, 'spend_0':'spend_2'].sum(axis=1) 
print (a) 
0 19 
1 11 
2  9 
3 12 
4 11 
dtype: int64 

print (df.loc[:, 'spend_3':'spend_5']) 
    spend_3 spend_4 spend_5 
0  7  7  0 
1  2  2  2 
2  4  0  9 
3  1  5  3 
4  7  1  1 

b = df.loc[:, 'spend_3':'spend_5'].sum(axis=1) 
print (b) 
0 14 
1  6 
2 13 
3  9 
4  9 
dtype: int64 
0

感謝Jezrael作品比這裏我得更好:

column_names = [x for x in df.columns.values.tolist() 
       if x.startswith("spend_") 
       ] 

pre = df.loc[:,column_names[:26]] 
pre = pre.sum(axis=1) 
post = df.loc[:,column_names[26:]] 
post = post.sum(axis=1)