如何刪除pandas resample添加的其他日子？

我做這個廣泛的研究，所以請downvoting之前閱讀..如何刪除pandas resample添加的其他日子？

我有蜱數據的大熊貓DF，與指數datetime64[ns]我想這個數據重新取樣到5個分鐘爲間隔是這樣的：price_5min = price.price.resample('5T').ohlc().between_time('09:00:00, '16:20:00')

它的作品，但它增加了週末和假期到新的時間序列，我需要刪除。我沒有關注美國（或任何其他標準假期日曆）。我只想刪除不在原始price df中的天數。

索引不是唯一的，許多情況下具有相同的時間戳。大熊貓版本0.20.1

我曾嘗試：

1）dropna（）：我有我需要ffill的NaN行，所以這是不行的。

2）price.index.difference（price_5min.index）：給我所有的勾號日期，而不是日期日期。

3）price.index.date.difference（price_5min.index.date）：不爲index.date工作是numpy.ndarray

4）搶購價格= price_5min：錯誤：只能比較相同標記的數據框對象

5）price.index = price_4min.index：錯誤：長度必須匹配比較

建議邏輯來解決我的問題：

一）不知何故，當天日期比較的兩個數據框和基於此刪除，但如何？

b）刪除沒有差異的日子，但是如何？

三）明顯，我沒有想到的（最有可能的..）

DF價格看起來是這樣的：

     price quantity 
time         
2016-06-15 16:19:20 29.85  429.6 
2016-06-15 16:19:20 29.85  65.6 
2016-06-15 16:19:20 29.85 1351.4 
2016-06-15 16:19:30 29.70  729.4 
2016-06-15 16:19:30 29.70  287.0 
2016-06-15 16:19:30 29.70  219.4 
2016-06-15 16:19:49 29.70  47.4 
2016-06-15 16:19:52 29.70  11.8 
2016-06-16 09:01:42 29.05  350.0 
2016-06-16 09:01:42 29.10  189.4 
2016-06-16 09:01:45 29.05  33.6 
2016-06-16 09:01:54 29.05  33.6 
...

任何幫助將非常感激。

來源

2017-07-04 cJc

我想你可以通過boolean indexing使用np.setdiff1d和numpy.in1d和過濾：

diffs = np.setdiff1d(price_5min.index.date, price.index.date)) 
df = price_5min[~np.in1d(price_5min.index.date, diffs]

與DatetimeIndex.floor或to_period另一種解決方案：

dates = price.index.floor('D') 
dates_5min = price_5min.index.floor('D') 
df = price_5min[~dates_5min.isin(dates_5min.difference(dates))]

dates = price.index.to_period('D') 
dates_5min = price_5min.index.to_period('D') 
df = price_5min[~dates_5min.isin(dates_5min.difference(dates))]

來源

2017-07-04 08:05:39 jezrael

你達人！乾杯。 – cJc

也謝謝。順便說一句，非常好的問題和巨大的研究，它是超級。美好的一天！ – jezrael

如何刪除pandas resample添加的其他日子？

回答

相關問題