熊貓：循環通過數據幀與超時計數器

我有具有日期在它的數據幀df：熊貓：循環通過數據幀與超時計數器

df['Survey_Date'].head(4) 
Out[65]: 
0 1990-09-28 
1 1991-07-26 
2 1991-11-23 
3 1992-10-15

我對計算兩個的日期之間的度量，使用單獨的數據幀flow_df。

flow_df的樣子：

 date flow 
0 1989-01-01 7480 
1 1989-01-02 5070 
2 1989-01-03 6410 
3 1989-01-04 10900 
4 1989-01-05 11700

例如，我想查詢基礎上，current_date和early_date另一個數據幀。感興趣的第一個時間段是：

current_date = 1991-07-26 
early_date = 1990-09-28

我寫了一個笨重的for循環，它能夠完成任務，但我相信還有一個更優雅的方式：

我用計數器的方法和循環：

def find_peak(early_date,current_date,flow_df): 
    mask = (flow_df['date']>= early_date) & (flow_df['date'] < current_date) 
    query = flow_df.loc[mask] 
    peak_flow = np.max(query['flow'])*0.3048**3 
    return peak_flow 

n=0 
for thing in df['Survey_Date'][1:]: 
    early_date = df['Survey_Date'][n] 
    current_date = thing 
    peak_flow = find_peak(early_date,current_date,flow_df) 
    n+=1 
    df['Avg_Stage'][n] = peak_flow

我怎麼能做到這一點沒有一個計數器和循環？

所需的輸出看起來像：

Survey_Date Avg_Stage 
0 1990-09-28 
1 1991-07-26 574.831986 
2 1991-11-23 526.693347 
3 1992-10-15 458.732915 
4 1993-04-01 855.168767 
5 1993-11-17 470.059653 
6 1994-04-07 419.089330 
7 1994-10-21 450.237861 
8 1995-04-24 498.376500 
9 1995-06-23 506.871554

來源

2016-08-14 dubbbdan

你的意思是你想選擇'early-date'和'current-date'之間的時間段嗎？ –

是的，但我的問題是如何循環利用感興趣的日期的數據幀。 – dubbbdan

在你的數據框中，'early-date'和'current-date'之間沒有任何關係。你能發佈一個期望的輸出嗎？ –

您可以定義識別調查期間並使用pandas.DataFrame.groupby以避免的新變量循環。當flow_df很大時應該快得多。

#convert both to datetime, if they are not 
df['Survey_Date'] = pd.to_datetime(df['Survey_Date']) 
flow_df['date'] = pd.to_datetime(flow_df['date']) 

#Merge Survey_Date to flow_df. Most rows of flow_df['Survey_Date'] should be NaT 
flow_df = flow_df.merge(df, left_on='date', right_on='Survey_Date', how='outer') 

# In case not all Survey_Date in flow_df['date'] or data not sorted by date. 
flow_df['date'].fillna(flow_df['Survey_Date'], inplace=True) 
flow_df.sort_values('date', inplace=True) 

#Identify survey period. In your example: [1990-09-28, 1991-07-26) is represented by 0; [1991-07-26, 1991-11-23) = 1; etc. 
flow_df['survey_period'] = flow_df['Survey_Date'].notnull().cumsum() 

#calc Avg_Stage in each survey_period. I did .shift(1) because you want to align period [1990-09-28, 1991-07-26) to 1991-07-26 
df['Avg_Stage'] = (flow_df.groupby('survey_period')['flow'].max()*0.3048**3).shift(1)

來源

2016-08-15 02:08:35 Happy001

這正是我一直在尋找的！ 'flow_df'非常大，語句笨重。謝謝！ – dubbbdan

您可以使用zip()：

for early_date, current_date in zip(df['Survey_Date'], df['Survey_Date'][1:]): 
    #do whatever yo want.

當然，你可以把它變成一個列表理解：

[some_metric(early_date, current_date) for early_date, current_date in zip(df['Survey_Date'], df['Survey_Date'][1:])]

來源

2016-08-14 23:16:35

好，這是越來越接近我後。 – dubbbdan

如果你希望更多，你需要更精確地在你的問題 –

我想簡單地說我不想使用for循環將足夠的細節，但我回去編輯我原來的帖子，包括更多的細節。 – dubbbdan

熊貓：循環通過數據幀與超時計數器

回答

相關問題