2016-09-21 171 views
6

我有一個巨大的數據幀,其日期時間類型列名爲dt,數據幀已經基於dt排序。我想根據dt將數據幀分成幾個數據幀,每個數據幀包含1 hr範圍內的行。熊貓如何按時間間隔按列分割數據幀

拆分

dt     text 
0 20160811 11:05  a 
1 20160811 11:35  b 
2 20160811 12:03  c 
3 20160811 12:36  d 
4 20160811 12:52  e 
5 20160811 14:32  f 

dt     text 
0 20160811 11:05  a 
1 20160811 11:35  b 
2 20160811 12:03  c 

    dt     text 
0 20160811 12:36  d 
1 20160811 12:52  e 

    dt     text 
0 20160811 14:32  f 
+0

問一個問題的形式 - 不是 「我要」。 – charlesreid1

回答

7

您可以通過轉換爲hourdt的第一價值的差額,需要通過groupbyastype

S = pd.to_datetime(df.dt) 
for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')]): 
     print (g.reset_index(drop=True)) 

       dt text 
0 20160811 11:05 a 
1 20160811 11:35 b 
2 20160811 12:03 c 
       dt text 
0 20160811 12:36 d 
1 20160811 12:52 e 
       dt text 
0 20160811 14:32 f 

List comprehension所以lution:

S = pd.to_datetime(df.dt) 

print ((S - S[0]).astype('timedelta64[h]')) 
0 0.0 
1 0.0 
2 0.0 
3 1.0 
4 1.0 
5 3.0 
Name: dt, dtype: float64 

L = [g.reset_index(drop=True) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')])] 

print (L[0]) 
       dt text 
0 20160811 11:05 a 
1 20160811 11:35 b 
2 20160811 12:03 c 

print (L[1]) 
       dt text 
0 20160811 12:36 d 
1 20160811 12:52 e 

print (L[2]) 
       dt text 
0 20160811 14:32 f 

舊的解決方案,其分裂的hour

您可以通過dt.hour使用groupby,但首先需要轉換dtto_datetime

for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour]): 
    print (g.reset_index(drop=True)) 

       dt text 
0 20160811 11:05 a 
1 20160811 11:35 b 
       dt text 
0 20160811 12:03 c 
1 20160811 12:36 d 
2 20160811 12:52 e 
       dt text 
0 20160811 14:32 f 

List comprehension解決方案:

L = [g.reset_index(drop=True) for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour])] 

print (L[0]) 
       dt text 
0 20160811 11:05 a 
1 20160811 11:35 b 

print (L[1]) 
       dt text 
0 20160811 12:03 c 
1 20160811 12:36 d 
2 20160811 12:52 e 

print (L[2]) 
       dt text 
0 20160811 14:32 f 

或者使用list comprehension與轉換列dtdatetime

df.dt = pd.to_datetime(df.dt) 
L =[g.reset_index(drop=True) for i, g in df.groupby([df['dt'].dt.hour])] 

print (L[1]) 
        dt text 
0 2016-08-11 12:03:00 c 
1 2016-08-11 12:36:00 d 
2 2016-08-11 12:52:00 e 

print (L[2]) 
        dt text 
0 2016-08-11 14:32:00 f 

如果需要通過date S和hour而分裂:

#changed dataframe for testing 
print (df) 
       dt text 
0 20160811 11:05 a 
1 20160812 11:35 b 
2 20160813 12:03 c 
3 20160811 12:36 d 
4 20160811 12:52 e 
5 20160811 14:32 f 

serie = pd.to_datetime(df.dt) 
for i, g in df.groupby([serie.dt.date, serie.dt.hour]): 
    print (g.reset_index(drop=True)) 
       dt text 
0 20160811 11:05 a 
       dt text 
0 20160811 12:36 d 
1 20160811 12:52 e 
       dt text 
0 20160811 14:32 f 
       dt text 
0 20160812 11:35 b 
       dt text 
0 20160813 12:03 c  
+0

謝謝,如果我想分組2小時? – 9blue

+0

我想你只需要添加'2','astype('timedelta64 [2h]'))' – jezrael

2

取紅棗的差異與第一次約會和小組通過total_seconds

df.groupby((df.dt - df.dt[0]).dt.total_seconds() // 3600, 
      as_index=False).apply(pd.DataFrame.reset_index, drop=True) 

enter image description here