2016-09-15 31 views
0

考慮:基於關未來的數據幀設置一個當前的行值

d = { 
    'datetime': ['2010-01-08 09:45:00', '2010-01-08 10:00:00', 
       '2010-01-08 10:15:00', '2010-01-08 10:30:00', 
       '2010-01-08 10:45:00', '2010-01-08 11:00:00', 
       '2010-01-08 11:15:00', '2010-01-08 11:30:00', 
       '2010-01-08 11:45:00', '2010-01-08 12:00:00', 
       '2010-01-08 12:15:00', '2010-01-08 12:30:00', 
       '2010-01-08 12:45:00', '2010-01-08 13:00:00', 
       '2010-01-08 13:15:00', '2010-01-08 13:30:00', 
       '2010-01-08 13:45:00', '2010-01-08 14:00:00', 
       '2010-01-08 14:15:00', '2010-01-08 14:30:00', 
       '2010-01-08 14:45:00', '2010-01-08 15:00:00', 
       '2010-01-08 15:15:00', '2010-01-08 15:30:00', 
       '2010-01-08 15:45:00', '2010-01-08 16:00:00', 
       '2010-01-08 16:15:00'], 
    'Total-tops': [0,-1,-1,2,3,0,0,4,0,0,0,0,5,6,7,8,-1,0,0,0,0,0,0,0,-1,-1,2] 
} 

df = pandas.DataFrame(d) 
df = df.set_index('datetime') 

我想補充另一列是該行是否將打破與否的布爾。休息意味着上衣的數量大於1,然後在未來的某個地方出現-1。例如,前兩個將在遇到的下一個-1處中斷。這裏是所需的數據幀: desired_dataframe

這是我目前使用的函數,但它運行非常慢,因爲我遍歷所有行。

def does_break(data): 
    cur_breaks = [] 

    for index, row in data.iterrows(): 
     if row['Total-tops'] > 1: 
      # Get all rows after this time that are new tops 
      breaks = data[(data['Total-tops'] == -1) & (data.index.time > index.time())] 
      if len(breaks) > 0: 
       cur_breaks.append(True) 
      else: 
       cur_breaks.append(False) 
     else: 
      cur_breaks.append(False) 
    return cur_breaks 

回答

1

如何:

latest_break = df.index[(df['Total-tops'] == -1)].max() 
df['break'] = 1 
df['break'] = df['break'].where((df['Total-tops'] > 0) & (df.index < latest_break), 0) 

設置斷點爲1的最新突破之前發生的所有正值

1

可以使用笨拙的表達

In [56]: import numpy as np 

In [57]: ((np.cumsum((df['Total-tops'] == -1)[:: -1])[:: -1] > 0) & (df['Total-tops'] > 0)).astype(int) 
Out[57]: 
datetime 
2010-01-08 09:45:00 0 
2010-01-08 10:00:00 0 
2010-01-08 10:15:00 0 
2010-01-08 10:30:00 1 
2010-01-08 10:45:00 1 
2010-01-08 11:00:00 0 
2010-01-08 11:15:00 0 
2010-01-08 11:30:00 1 
2010-01-08 11:45:00 0 
2010-01-08 12:00:00 0 
2010-01-08 12:15:00 0 
2010-01-08 12:30:00 0 
2010-01-08 12:45:00 1 
2010-01-08 13:00:00 1 
2010-01-08 13:15:00 1 
2010-01-08 13:30:00 1 
2010-01-08 13:45:00 0 
2010-01-08 14:00:00 0 
2010-01-08 14:15:00 0 
2010-01-08 14:30:00 0 
2010-01-08 14:45:00 0 
2010-01-08 15:00:00 0 
2010-01-08 15:15:00 0 
2010-01-08 15:30:00 0 
2010-01-08 15:45:00 0 
2010-01-08 16:00:00 0 
2010-01-08 16:15:00 0 
Name: Total-tops, dtype: int64 

(當然,你的新列,您可以使用df['breaks'] = ...)。

這裏做的事情如下:

  1. 我們找到值爲-1的位置,並且相反。現在我們過去所做的任何操作(特別是cumsum)都是在未來進行的。
  2. 我們找到了累計和,然後再次反轉。在這一點上,含義是未來將有多少次我們會看到-1。
  3. 我們發現哪裏的結果大於0,因爲我們不在乎會多少次我們會看到-1,只有我們會看到它
  4. 最後,我們還要求當前條目是肯定的。這只是你問題的定義。
相關問題