每日觀察我有一個Python數據幀像查找最接近於特定的時間間隔不規則的數據

Out[110]: 
Time 
2014-09-19 21:59:14 55.975 
2014-09-19 21:56:08 55.925 
2014-09-19 21:53:05 55.950 
2014-09-19 21:50:29 55.950 
2014-09-19 21:50:03 55.925 
2014-09-19 21:47:00 56.150 
2014-09-19 21:53:57 56.225 
2014-09-19 21:40:51 56.225 
2014-09-19 21:37:50 56.300 
2014-09-19 21:34:46 56.300 
2014-09-19 21:31:41 56.350 
2014-09-19 21:30:08 56.500 
2014-09-19 21:28:39 56.375 
2014-09-19 21:25:34 56.350 
2014-09-19 21:22:32 56.400 
2014-09-19 21:19:27 56.325 
2014-09-19 21:16:25 56.325 
2014-09-19 21:13:21 56.350 
2014-09-19 21:10:18 56.425 
2014-09-19 21:07:13 56.475 
Name: Spread, dtype: float64

延伸在長時間內（幾個月到幾年），因此與很多觀察每一天。我想要做的是我每天想要檢索最接近特定時間的時間序列觀察值，比如16:00。

我的做法到目前爲止一直

eodsearch = pd.DataFrame(df['Date'] + datetime.timedelta(hours=16)) 

eod = df.iloc[df.index.get_loc(eodsearch['Date'] ,method='nearest')]

目前給我的

"Cannot convert input [Time Date, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp

另外一個錯誤，我看到get_loc也接受公差作爲輸入，所以如果我可以設置公差說30分鐘，這將是偉大的。

關於爲什麼我的代碼失敗或如何解決它的任何建議？

來源

2017-02-13 thevaluebay

請勿將數據發佈爲圖像。我手動輸入數據並替換圖像並將代碼格式化爲代碼。請參閱[Markdown幫助]（http://stackoverflow.com/editing-help）瞭解如何在您的問題和答案中設置代碼格式。 –

準備數據：

from pandas.tseries.offsets import Hour 

df.sort_index(inplace=True) # Sort indices of original DF if not in sorted order 
# Create a lookup dataframe whose index is offsetted by 16 hours 
d = pd.DataFrame(dict(Time=pd.unique(df.index.date) + Hour(16)))

（ⅰ）：使用reindex支持觀測兩種方式查找：（雙向兼容）

# Find values in original within +/- 30 minute interval of lookup 
df.reindex(d['Time'], method='nearest', tolerance=pd.Timedelta('30Min'))

（ⅱ）：（向後兼容）

# Find values in original within 30 minute interval of lookup (backwards) 
pd.merge_asof(d, df.reset_index(), on='Time', tolerance=pd.Timedelta('30Min'))

（ⅲ）：爲了獲得日期範圍使用merge_asof在原始DF識別獨特日期之後來自+/-通過查詢和重新索引獲得30分鐘帶寬間隔：

Index.get_loc對輸入的單個標籤進行操作，因此整個系列對象不能直接傳遞給它。

相反，DatetimeIndex.indexer_between_time這給騙內的索引指定start_time & end_time那天明智的會更適合用於此目的的所有行。（兩個端點都包括在內）

# Tolerance of +/- 30 minutes from 16:00:00 
df.iloc[df.index.indexer_between_time("15:30:00", "16:30:00")]

數據用於在結果得出：

idx = pd.date_range('1/1/2017', periods=200, freq='20T', name='Time') 
np.random.seed(42) 
df = pd.DataFrame(dict(observation=np.random.uniform(50,60,200)), idx) 
# Shuffle indices 
df = df.sample(frac=1., random_state=42)

信息：

df.info() 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 200 entries, 2017-01-02 07:40:00 to 2017-01-02 10:00:00 
Data columns (total 1 columns): 
observation 200 non-null float64 
dtypes: float64(1) 
memory usage: 3.1 KB

來源

2017-02-13 17:46:29

非常感謝您的幫助！ – thevaluebay

檢查輸出後，似乎merge_asof只查看指定時間點之前的值，所以不是+/-而是僅僅是 - ？ – thevaluebay

From（http://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge_asof.html#pandas.merge_asof）我發現「對於左邊的DataFrame中的每一行，我們選擇最後一行「開」鍵小於或等於左鍵的正確DataFrame「 – thevaluebay

查找最接近於特定的時間間隔不規則的數據

回答

準備數據：

相關問題