2015-02-23 132 views
1

我想創建一個新列,它會顯示在兩個日期之間的天timedelta如下面的熊貓數據框:大熊貓天

>>> hg[['not inc','date']] 
    not inc    date 
0 False 2012-02-29 00:00:00 
1 False 2012-03-16 00:00:00 
2 False 2012-04-04 00:00:00 
3  True 2012-05-08 00:00:00 
4 False 2012-05-12 00:00:00 
5 False 2012-05-26 00:00:00 
6 False 2012-06-09 00:00:00 
7 False 2012-10-13 00:00:00 
8 False 2012-11-10 00:00:00 
9  True 2013-03-19 00:00:00 
10 False 2013-04-01 00:00:00 
11 False 2013-04-25 00:00:00 
12 False 2013-05-04 00:00:00 
13 False 2013-05-18 00:00:00 
14 False 2013-06-01 00:00:00 
15 True 2013-08-20 00:00:00 
16 False 2013-08-31 00:00:00 
17 False 2013-09-21 00:00:00 
18 False 2013-10-05 00:00:00 
19 False 2013-10-19 00:00:00 
20 False 2013-11-09 00:00:00 
21 True 2014-01-21 00:00:00 
22 False 2014-02-08 00:00:00 
23 False 2014-02-22 00:00:00 
24 False 2014-03-08 00:00:00 
25 False 2014-03-29 00:00:00 
26 False 2014-04-19 00:00:00 
27 True 2014-07-21 00:00:00 
28 True 2014-08-01 00:00:00 
29 False 2014-08-09 00:00:00 
30 False 2014-08-30 00:00:00 
31 False 2014-09-13 00:00:00 
32 True 2014-09-26 00:00:00 
33 False 2014-10-04 00:00:00 
34 True 2015-01-08 00:00:00 
35 True 2015-01-20 00:00:00 
36 False 2015-01-31 00:00:00 
37 False 2015-02-14 00:00:00 

我想要的日期差的開始減去2012-01-02並且是一個整數。

這是我嘗試過的,但沒有成功,因爲prevdate不會更新到新行的日期,但始終指的是datetime(2012,01,02)的原始起始位置。我正在通過數據幀的行使用iterrows。

>>>for index, row in hg.iterrows(): 
    prevdate = datetime(2012,01,02) 
    dsince = row['date']-prevdate 
    prevdate = row['date'] 
    print dsince 

結果(此外,我不知道如何修改值轉換成int):

58 days, 0:00:00 
74 days, 0:00:00 
93 days, 0:00:00 
127 days, 0:00:00 
131 days, 0:00:00 
145 days, 0:00:00 
159 days, 0:00:00 
285 days, 0:00:00 
313 days, 0:00:00 
442 days, 0:00:00 
455 days, 0:00:00 
479 days, 0:00:00 
488 days, 0:00:00 
502 days, 0:00:00 
516 days, 0:00:00 
596 days, 0:00:00 
607 days, 0:00:00 
628 days, 0:00:00 
642 days, 0:00:00 
656 days, 0:00:00 
677 days, 0:00:00 
750 days, 0:00:00 
768 days, 0:00:00 
782 days, 0:00:00 
796 days, 0:00:00 
817 days, 0:00:00 
838 days, 0:00:00 
931 days, 0:00:00 
942 days, 0:00:00 
950 days, 0:00:00 
971 days, 0:00:00 
985 days, 0:00:00 
998 days, 0:00:00 
1006 days, 0:00:00 
1102 days, 0:00:00 
1114 days, 0:00:00 
1125 days, 0:00:00 
1139 days, 0:00:00 

要更復雜一些,我想只有創建日期差異的另一列使事情在'不包含'列有False的行之間。

謝謝。

+0

您是否嘗試過'dsince =(row ['date'] - prevdate).days'? – Uri 2015-02-23 14:20:48

+0

有點幫助我,謝謝 – user3374113 2015-02-23 14:44:58

回答

1

假設你的日期列已經投作爲一個datetime64

In [61]: hg = pd.DataFrame({"not inc":[False , False, False, True, False],"date":pd.to_datetime(pd.Series(["2012-02-29", "2012-03-16", "2012-04-04", "2012-05-08", "2012-05-12"]))}) 

In [63]: hg.dtypes 
Out[63]: 
date  datetime64[ns] 
not inc    bool 
dtype: object 

暫時濾掉行你不想包括:

In [64]: included = hg[hg["not inc"] == False] 

使用shift獲得了一系列的你想要減去的日期,在開始日期填入你的開始日期:

In [66]: prev_dates = included.date.shift().fillna(pd.datetime(2012,1,2)) 

In [67]: prev_dates 
Out[67]: 
0 2012-01-02 
1 2012-02-29 
2 2012-03-16 
4 2012-04-04 
Name: date, dtype: datetime64[ns] 

減去日期和重鑄timedelta爲int:

In [68]: delta = included.date - prev_dates 

In [69]: delta = delta.astype("timedelta64[D]") 

In [70]: delta 
Out[70]: 
0 58 
1 16 
2 19 
4 38 
Name: date, dtype: float64 

然後concat新系列,以原始的數據幀。

In [71]: delta.name = "delta" 

In [72]: hg = pd.concat((hg, delta), axis=1) 

In [73]: hg 
Out[73]: 
     date not inc delta 
0 2012-02-29 False  58 
1 2012-03-16 False  16 
2 2012-04-04 False  19 
3 2012-05-08 True NaN 
4 2012-05-12 False  38 
+0

感謝您的回答,它的作品是一種享受,我從你提供給我的東西中學到了很多東西。我唯一的查詢就是'delta.astype(「timedelta64 [D]」)'出現了一個錯誤'TypeError:不能從[timedelta64 [ns]]到[timedelta64 [D]]''中添加一個timedelta。你認爲有這樣做的另一種方法 – user3374113 2015-02-23 15:12:05

+0

你可以嘗試這裏的一些想法:http://stackoverflow.com/questions/18215317/extracting-days-from-a-numpy-timedelta64-value – 2015-02-23 15:25:03

0

在循環之前放置線prevdate = datetime(2012,01,02)

prevdate = datetime(2012,01,02) 
for index, row in hg.iterrows(): 
    dsince = (row['date'] - prevdate).days 
    prevdate = row['date'] 
    print dsince 

如果它不工作,轉換prevdaterow['date']爲日期。