2017-04-03 115 views
2

DatetimeIndex對象,如熊貓:如何提取日期時間範圍從DatetimeIndex

的集合
DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00', 
       '2008-02-01 00:00:00', '2008-03-01 00:00:00', 
       '2008-04-01 00:00:00', '2012-09-01 00:10:00', 
       '2012-09-01 00:20:00', '2012-09-01 00:30:00', 
       '2012-09-01 00:40:00', '2012-09-01 00:50:00', 
       ... 
       '2012-09-30 22:40:00', '2012-09-30 22:50:00', 
       '2012-09-30 23:00:00', '2012-09-30 23:10:00', 
       '2012-09-30 23:20:00', '2012-09-30 23:30:00', 
       '2012-09-30 23:40:00', '2012-09-30 23:50:00', 
       '2012-10-01 00:00:00', '2015-07-01 00:00:00'], 
       dtype='datetime64[ns]', length=4326, freq=None, tz=None) 

無論其freqinferred_freqNone,我想是因爲即使實際上數據有10分鐘的時間,由於缺少零件,無法檢測到。只是這些缺少的部分,或者等價地,我想盡可能高效地提取可用部分。也就是說,我希望得到如下範圍列表:

[('2007-11-01 00:00:00', '2007-11-01 00:00:00'), 
('2008-01-01 00:00:00', '2008-01-01 00:00:00'), 
('2008-02-01 00:00:00', '2008-02-01 00:00:00'), 
('2008-03-01 00:00:00', '2008-03-01 00:00:00'), 
('2008-04-01 00:00:00', '2008-04-01 00:00:00'), 
('2012-09-01 00:10:00', '2012-10-01 00:00:00'), 
('2015-07-01 00:00:00', '2015-07-01 00:00:00')] 

我該如何去做這件事?我曾看過PeriodIndex,但這似乎是針對不同類型的應用程序,或者可能僅僅不處理任意時間間隔。

回答

1

我認爲你可以使用grouper系列groupby和總minmax

grouper通過與10 minutecumsum比較difference創建。

rng = pd.DatetimeIndex(['2007-11-01 00:00:00', '2008-01-01 00:00:00', 
       '2008-02-01 00:00:00', '2008-03-01 00:00:00', 
       '2008-04-01 00:00:00', '2012-09-01 00:10:00', 
       '2012-09-01 00:20:00', '2012-09-01 00:30:00', 
       '2012-09-01 00:40:00', '2012-09-01 00:50:00', 
       '2012-09-30 22:40:00', '2012-09-30 22:50:00', 
       '2012-09-30 23:00:00', '2012-09-30 23:10:00', 
       '2012-09-30 23:20:00', '2012-09-30 23:30:00', 
       '2012-09-30 23:40:00', '2012-09-30 23:50:00', 
       '2012-10-01 00:00:00', '2015-07-01 00:00:00']) 

s = pd.Series(rng) 
grouper = s.diff().ne(pd.to_timedelta('10min')).cumsum() 
print (grouper) 
0  1 
1  2 
2  3 
3  4 
4  5 
5  6 
6  6 
7  6 
8  6 
9  6 
10 7 
11 7 
12 8 
13 8 
14 8 
15 8 
16 8 
17 8 
18 8 
19 9 
dtype: int32 
print (s.groupby(grouper).agg(['min', 'max']).astype(str).apply(tuple, axis=1).tolist()) 
[('2007-11-01 00:00:00', '2007-11-01 00:00:00'), 
('2008-01-01 00:00:00', '2008-01-01 00:00:00'), 
('2008-02-01 00:00:00', '2008-02-01 00:00:00'), 
('2008-03-01 00:00:00', '2008-03-01 00:00:00'), 
('2008-04-01 00:00:00', '2008-04-01 00:00:00'), 
('2012-09-01 00:10:00', '2012-09-01 00:50:00'), 
('2015-09-30 22:40:00', '2015-09-30 22:50:00'), 
('2012-09-30 23:00:00', '2012-10-01 00:00:00'), 
('2015-07-01 00:00:00', '2015-07-01 00:00:00')] 
+0

我添加新的答案,請檢查一下。 – jezrael

+0

這工作非常好。我遺漏了'astype(str)',因爲它轉換爲我的本地時區;回來'時間戳'很好。 – equaeghe

+0

超級,謝謝。 – jezrael