2016-08-14 72 views
0

我有具有日期在它的數據幀df熊貓:循環通過數據幀與超時計數器

df['Survey_Date'].head(4) 
Out[65]: 
0 1990-09-28 
1 1991-07-26 
2 1991-11-23 
3 1992-10-15 

我對計算兩個的日期之間的度量,使用單獨的數據幀flow_df

flow_df的樣子:

 date flow 
0 1989-01-01 7480 
1 1989-01-02 5070 
2 1989-01-03 6410 
3 1989-01-04 10900 
4 1989-01-05 11700 

例如,我想查詢基礎上,current_dateearly_date另一個數據幀。感興趣的第一個時間段是:

current_date = 1991-07-26 
early_date = 1990-09-28 

我寫了一個笨重的for循環,它能夠完成任務,但我相信還有一個更優雅的方式:

我用計數器的方法和循環:

def find_peak(early_date,current_date,flow_df): 
    mask = (flow_df['date']>= early_date) & (flow_df['date'] < current_date) 
    query = flow_df.loc[mask] 
    peak_flow = np.max(query['flow'])*0.3048**3 
    return peak_flow 

n=0 
for thing in df['Survey_Date'][1:]: 
    early_date = df['Survey_Date'][n] 
    current_date = thing 
    peak_flow = find_peak(early_date,current_date,flow_df) 
    n+=1 
    df['Avg_Stage'][n] = peak_flow 

我怎麼能做到這一點沒有一個計數器和循環?

所需的輸出看起來像:

Survey_Date Avg_Stage 
0 1990-09-28 
1 1991-07-26 574.831986 
2 1991-11-23 526.693347 
3 1992-10-15 458.732915 
4 1993-04-01 855.168767 
5 1993-11-17 470.059653 
6 1994-04-07 419.089330 
7 1994-10-21 450.237861 
8 1995-04-24 498.376500 
9 1995-06-23 506.871554 
+0

你的意思是你想選擇'early-date'和'current-date'之間的時間段嗎? –

+0

是的,但我的問題是如何循環利用感興趣的日期的數據幀。 – dubbbdan

+1

在你的數據框中,'early-date'和'current-date'之間沒有任何關係。你能發佈一個期望的輸出嗎? –

回答

2

您可以定義識別調查期間並使用pandas.DataFrame.groupby以避免的新變量循環。當flow_df很大時應該快得多。

#convert both to datetime, if they are not 
df['Survey_Date'] = pd.to_datetime(df['Survey_Date']) 
flow_df['date'] = pd.to_datetime(flow_df['date']) 

#Merge Survey_Date to flow_df. Most rows of flow_df['Survey_Date'] should be NaT 
flow_df = flow_df.merge(df, left_on='date', right_on='Survey_Date', how='outer') 

# In case not all Survey_Date in flow_df['date'] or data not sorted by date. 
flow_df['date'].fillna(flow_df['Survey_Date'], inplace=True) 
flow_df.sort_values('date', inplace=True) 

#Identify survey period. In your example: [1990-09-28, 1991-07-26) is represented by 0; [1991-07-26, 1991-11-23) = 1; etc. 
flow_df['survey_period'] = flow_df['Survey_Date'].notnull().cumsum() 

#calc Avg_Stage in each survey_period. I did .shift(1) because you want to align period [1990-09-28, 1991-07-26) to 1991-07-26 
df['Avg_Stage'] = (flow_df.groupby('survey_period')['flow'].max()*0.3048**3).shift(1) 
+0

這正是我一直在尋找的! 'flow_df'非常大,語句笨重。謝謝! – dubbbdan

0

您可以使用zip()

for early_date, current_date in zip(df['Survey_Date'], df['Survey_Date'][1:]): 
    #do whatever yo want. 

當然,你可以把它變成一個列表理解:

[some_metric(early_date, current_date) for early_date, current_date in zip(df['Survey_Date'], df['Survey_Date'][1:])] 
+0

好,這是越來越接近我後。 – dubbbdan

+1

如果你希望更多,你需要更精確地在你的問題 –

+0

我想簡單地說我不想使用for循環將足夠的細節,但我回去編輯我原來的帖子,包括更多的細節。 – dubbbdan