2017-08-14 72 views
0

我想選擇的數據集的右部爲具有以下示例說明:分割由time列的數據幀 - 大熊貓

輸入DF:

id_B, ts_B,value 
id1,2017-04-27 01:35:30,0 
id1,2017-04-27 01:35:40,0 
id1,2017-04-27 01:35:50,1 
id1,2017-04-27 01:36:00,4 
id1,2017-04-27 01:36:10,5 
id1,2017-04-27 01:36:20,100 
id1,2017-04-27 01:36:30,155 
id1,2017-04-27 01:36:40,235 
id1,2017-04-27 01:36:50,0 
id1,2017-04-27 01:36:60,0 
id1,2017-04-27 01:37:00,2353 
id1,2017-04-27 01:37:10,221 
id1,2017-04-27 01:37:20,2432 
id1,2017-04-27 01:37:30,2654 
id1,2017-04-27 01:37:40,12 
id1,2017-04-27 01:37:50,5 
id1,2017-04-27 01:38:00,5 
id1,2017-04-27 01:38:10,23 
id1,2017-04-27 01:38:20,5 
id1,2017-04-27 01:38:30,2 
id1,2017-04-27 01:38:40,2 
id1,2017-04-27 01:38:50,1 
id1,2017-04-27 01:39:00,0 
id1,2017-04-27 01:39:10,0 
id1,2017-04-27 01:39:20,0 
id1,2017-04-27 01:39:30,0 
id1,2017-04-27 01:39:40,0 
id1,2017-04-27 01:39:50,0 
id1,2017-04-27 01:40:00,0 
id1,2017-04-27 01:40:10,1 
id1,2017-04-27 01:40:20,5 
id1,2017-04-27 01:40:30,221 
id1,2017-04-27 01:40:40,2432 
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12 
id1,2017-04-27 01:41:00,5 
id1,2017-04-27 01:41:10,5 
id1,2017-04-27 01:41:20,23 
id1,2017-04-27 01:41:30,5 
id1,2017-04-27 01:41:40,2 
id1,2017-04-27 01:41:50,1 

考慮以下內容: segment_number = 1
持續時間= 3分鐘

我想選擇從第一個df.value非零開始的數據框的第一個段,直到覆蓋3分鐘持續時間的最後一個值。

輸出: id1,2017-04-27 01:35:50,1 id1,2017-04-27 01:36:00,4 id1,2017-04-27 01:36:10,5 id1,2017-04-27 01:36:20,100 id1,2017-04-27 01:36:30,155 id1,2017-04-27 01:36:40,235 id1,2017-04-27 01:36:50,0 id1,2017-04-27 01:36:60,0 id1,2017-04-27 01:37:00,2353 id1,2017-04-27 01:37:10,221 id1,2017-04-27 01:37:20,2432 id1,2017-04-27 01:37:30,2654 id1,2017-04-27 01:37:40,12 id1,2017-04-27 01:37:50,5 id1,2017-04-27 01:38:00,5 id1,2017-04-27 01:38:10,23 id1,2017-04-27 01:38:20,5 id1,2017-04-27 01:38:30,2 id1,2017-04-27 01:38:40,2 id1,2017-04-27 01:38:50,1

考慮以下內容: segment_number = 2
持續時間= 1.40分鐘再予

我想選擇的dateframe從第一df.value非零開始直到所述第二區段最後的值覆蓋了1.40分鐘的持續時間。

輸出:

id1,2017-04-27 01:40:10,1 
id1,2017-04-27 01:40:20,5 
id1,2017-04-27 01:40:30,221 
id1,2017-04-27 01:40:40,2432 
id1,2017-04-27 01:40:50,2654 
id1,2017-04-27 01:40:60,12 
id1,2017-04-27 01:41:00,5 
id1,2017-04-27 01:41:10,5 
id1,2017-04-27 01:41:20,23 
id1,2017-04-27 01:41:30,5 
id1,2017-04-27 01:41:40,2 
id1,2017-04-27 01:41:50,1 

到目前爲止,我沒有索引DF WRT到ts_B使用`pd.to_datetime和set_index」,並使用一個變量‘last_end_point’,保持了前一段的指數跟蹤。
但我沒有得到正確的輸出。

任何幫助,將不勝感激。

+0

那麼,你想拆你的'由遞減的時間間隔df'? –

+0

是的,有點。更具體地說,我想按持續時間和起點分開它,第一次是從頭開始,第二次是前一次的最後一行的索引。 –

+0

對不起,上一個分段的最後一個原始值+1。但它應該避免用df.value = 0開始段,並始終選擇不爲零的第一個段。 –

回答

0

這是我制定了答案:

import pandas as pd 
import numpy as np 
import datetime 

df = pd.read_csv("filename.csv") 
df['ts_B'] = pd.to_datetime(df['ts_B']) 

def find_the_energenies_segment(key_mapped, duration, energenie_df, threshold): 
    non_zero_indexs = energenie_df[energenie_df["value"]>threshold].index 

    first_index = non_zero_indexs[0] if len(non_zero_indexs)>0 else None 


    if(not first_index): 
     return {"sub_df": None, 
      "start_index": None, 
      "end_index":None, 
      "duration": duration} 

    start_time = energenie_df.loc[first_index].ts_B 
    hours,minutes,seconds = duration.split(":") 
    end_time = start_time + datetime.timedelta(hours=int(hours),minutes=int(minutes),seconds=int(seconds)) 


    last_index = energenie_df[energenie_df["ts_B"]>end_time].index[0]-1 

    return {"sub_df": energenie_df.loc[first_index:last_index], 
     "start_index": first_index, 
     "end_index":last_index, 
     "duration": duration} 


out = find_the_energenies_segment("id1", "00:03:00", df, 0) 
print(out)