groupby datediff在熊貓

我試圖得到最小日期和最大日期之間的區別，按月在新的專欄中出售產品。但是，在groupby中應用函數時，我有一個不尋常的回報。groupby datediff在熊貓

任何幫助，非常感謝。

所以我的步驟是：

數據：

pch_date  day product qty unit_price total_price year_month 
421 2013-01-07 tuesday  p3 13  4.58  59.54   1 
141 2015-09-13 monday  p8 3  3.77  11.31   9 
249 2015-02-02 monday  p5 3  1.80   5.40   2 
826 2015-10-09 tuesday  p5 6  1.80  10.80   10 
427 2014-04-18 friday  p7 6  4.21  25.26   4

函數定義：

def diff_date(x): 
     max_date = x.max() 
     min_date = x.min() 
     diff_month = (max_date.year - min_date.year)*12 + max_date.month +1 
     return diff_month

當試圖測試：

print diff_date(prod_df['pch_date'])

49這是正確的

但問題：

print prod_df[['product','pch_date']].groupby(['product']).agg({'pch_date': diff_date}).reset_index()[:5]

結果與一個額外的日期即將到來：

 product     pch_date 

0  p1 1970-01-01 00:00:00.000000049 
1  p10 1970-01-01 00:00:00.000000048 
2  p11 1970-01-01 00:00:00.000000045 
3  p12 1970-01-01 00:00:00.000000049 
4  p13 1970-01-01 00:00:00.000000045

如何獲得在整數區別？

來源

2016-09-17 Debojyoti Dey

您可以使用Groupby.apply，而不是返回整數而不是日期時間對象。

df.groupby(['product'])['pch_date'].apply(diff_date).reset_index()

至於不讓整數值解決方法得到轉化爲自己的DatetimeIndex值，你可以在你的函數的最後一行更改爲str(diff_month)，如圖所示，你可以繼續使用Groupby.agg：

df.groupby(['product'])['pch_date'].agg({'pch_date': diff_date}).reset_index()

來源

2016-09-17 11:49:01

謝謝尼基爾 - 你讓我的一天 –

groupby datediff在熊貓

回答

相關問題