2016-03-04 115 views
2

我得到這個功能:在panda中合計列值並將總數附加或合併到數據框?

def source_revenue(self): 
    items = self.data.items() 
    df = pandas.DataFrame(
     {'SOURCE OF BUSINESS': [i[0] for i in items], 'INCOME': [i[1] for i in items]}) 
    pivoting = pd.pivot_table(df, index=['SOURCE OF BUSINESS'], values=['INCOME']) 
    suming = pivoting.sum(index=(0), columns=(1)) 

此功能得到這樣的:

INCOME 216424.9 
dtype: float64 

沒有總結,它返回完整的數據幀是這樣的:

       INCOME 
SOURCE OF BUSINESS      
BYD - Other      500.0 
BYD - Retail     1584.0 
BYD - Transport    42498.0 
BYD Beverage - A La Carte  39401.5 
BYD Food - A La Carte 瓦廠食品-零點 68365.0 
BYD Food - Catering Banquet 53796.0 
BYD Rooms 瓦廠房間     5148.0 
GS - Retail      386.0 
GS Food - A La Carte    48.0 
Orchard Retail     130.0 
SCH - Food - A La Carte   96.0 
SCH - Retail      375.4 
SCH - Transport     888.0 
SCH Beverage - A La Carte  119.0 
Spa        3052.0 
XLM Beverage - A La Carte   38.0 

的原因,我這樣做是因爲我試圖獲取所有返回的行的總和,將它們相加並將總數附加到數據框。

起初,我試圖與利潤率= TRUE(我看在這裏,這是總結,總連接到數據幀,不是真的)

所以我想知道是否有歸還的方式是什麼數據幀,還可以總結這些值並將總數附加到數據幀的末尾,就像margins = True一樣。

回答

1

我想你可以使用groupby作爲pivot_table,因爲這裏的groupby更快。

您可以使用pivot_table,但默認aggfuncnp.mean。人們很容易忘記它:

pivoting = pd.pivot_table(df, 
          index=['SOURCE OF BUSINESS'], 
          values=['INCOME'], 
          aggfunc=np.mean) 

我想你需要aggfunc=np.sum

print df 
    A B  C D 
0 zoo one small 1 
1 zoo one large 2 
2 zoo one large 2 
3 foo two small 3 
4 foo two small 3 
5 bar one large 4 
6 bar one small 5 
7 bar two small 6 
8 bar two large 7 

print pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum) 
A 
bar 22 
foo  6 
zoo  5 
Name: D, dtype: int64 

df1 = df.groupby('A')['D'].sum() 
print df1 
A 
bar 22 
foo  6 
zoo  5 
Name: D, dtype: int64 

如果需要添加Total到系列,使用locsum

print df1.sum() 
33 

df1.loc['Total'] = df1.sum() 
print df1 
A 
bar  22 
foo  6 
zoo  5 
Total 33 
Name: D, dtype: int64 

計時

In [111]: %timeit df.groupby('A')['D'].sum() 
1000 loops, best of 3: 581 µs per loop 

In [112]: %timeit pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum) 
100 loops, best of 3: 2.28 ms per loop 

添加Totaldf通過setting with enlargement

print df 
           INCOME 
SOURCE OF BUSINESS     
BYD - Other     500.0 
BYD - Retail     1584.0 
BYD - Transport    42498.0 
BYD Beverage - A La Carte 39401.5 
BYD Food - A La Carte  68365.0 
BYD Food - Catering Banquet 53796.0 
BYD Rooms      5148.0 
GS - Retail     386.0 
GS Food - A La Carte   48.0 
Orchard Retail     130.0 
SCH - Food - A La Carte   96.0 
SCH - Retail     375.4 
SCH - Transport    888.0 
SCH Beverage - A La Carte  119.0 
Spa       3052.0 
XLM Beverage - A La Carte  38.0 
df.loc['Total', 'INCOME'] = df['INCOME'].sum() 
print df 
           INCOME 
SOURCE OF BUSINESS     
BYD - Other      500.0 
BYD - Retail     1584.0 
BYD - Transport    42498.0 
BYD Beverage - A La Carte  39401.5 
BYD Food - A La Carte   68365.0 
BYD Food - Catering Banquet 53796.0 
BYD Rooms      5148.0 
GS - Retail      386.0 
GS Food - A La Carte    48.0 
Orchard Retail     130.0 
SCH - Food - A La Carte   96.0 
SCH - Retail     375.4 
SCH - Transport     888.0 
SCH Beverage - A La Carte  119.0 
Spa       3052.0 
XLM Beverage - A La Carte  38.0 
Total      216424.9 
+0

謝謝你的徹底答案和性能測試。我得到'''NameError:name'np'未定義'試圖執行np.sum時...可能是缺少導入? – xavier

+0

好了,我不得不導入''numpy''',而實際的屬性是'''''numpy.sum''' – xavier

+0

是的,它工作得很好。再次感謝你 ! – xavier

1

df.ix[len(df)] = ...將行添加到您的數據幀的結束。您的數據需要匹配正確的列數。此外,我不會建議將此添加到您的數據,因爲任何後續分析將無效。可能最好創建一個新的系列,然後concat如果需要用於顯示目的。

df.ix[len(df)] = ['Total', df.INCOME.sum()] 

>>> df 
       SOURCE OF BUSINESS INCOME 
0      BYD - Other  500 
1      BYD - Retail  1584 
2     BYD - Transport 42498 
3   BYD Beverage - A La Carte 39401.5 
4 BYD Food - A La Carte _______  68365 
5  BYD Food - Catering Banquet 53796 
6     BYD Rooms ____  5148 
7      GS - Retail  386 
8    GS Food - A La Carte  48 
9     Orchard Retail  130 
10   SCH - Food - A La Carte  96 
11      SCH - Retail 375.4 
12     SCH - Transport  888 
13  SCH Beverage - A La Carte  119 
14        Spa  3052 
15  XLM Beverage - A La Carte  38 
16       Total 216425