熊貓重採樣不能正常工作

我從熊貓得到一個奇怪的行爲，我想重新採樣我的分鐘數據到小時數據（使用均值）。我的數據如下所示：熊貓重採樣不能正常工作

Data.head() 
         AAA BBB 
Time        
2009-02-10 09:31:00 86.34 101.00 
2009-02-10 09:36:00 86.57 100.50 
2009-02-10 09:38:00 86.58 99.78 
2009-02-10 09:40:00 86.63 99.75 
2009-02-10 09:41:00 86.52 99.66 

Data.info() 

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 961276 entries, 2009-02-10 09:31:00 to 2016-02-29 19:59:00 
Data columns (total 2 columns): 
AAA 961276 non-null float64 
BBB 961276 non-null float64 
dtypes: float64(2) 
memory usage: 22.0 MB 

Data.index 

Out[25]: 
DatetimeIndex(['2009-02-10 09:31:00', '2009-02-10 09:36:00', 
       '2009-02-10 09:38:00', '2009-02-10 09:40:00', 
       '2009-02-10 09:41:00', '2009-02-10 09:44:00', 
       '2009-02-10 09:45:00', '2009-02-10 09:46:00', 
       '2009-02-10 09:47:00', '2009-02-10 09:48:00', 
       ... 
       '2016-02-29 19:41:00', '2016-02-29 19:42:00', 
       '2016-02-29 19:43:00', '2016-02-29 19:50:00', 
       '2016-02-29 19:52:00', '2016-02-29 19:53:00', 
       '2016-02-29 19:56:00', '2016-02-29 19:57:00', 
       '2016-02-29 19:58:00', '2016-02-29 19:59:00'], 
       dtype='datetime64[ns]', name='Time', length=961276, freq=None)

要重新取樣我做數據如下：

tframe = '60T' 
hr_mean = Data.resample(tframe).mean()

而作爲輸出我得到的熊貓系列只有兩個數字在它：

In[26]: hr_mean 
Out[26]: 
AAA 156.535198 
BBB  30.197029 
dtype: float64

如果我選擇不同的時間範圍或重新採樣函數，我會得到相同的行爲。

來源

2016-06-09 Vitali Halapjan

你可以給你使用的熊貓版本嗎？ – joris

您顯示的行爲是老熊貓版本的預期行爲（pandas < 0.18）。較新的熊貓版本具有改變的resample API，其中您在這裏看到一個棘手的情況。

v0.18之前，resample使用how關鍵字來指定如何重新取樣，並返回重新採樣幀/直接系列：

In [5]: data = pd.DataFrame(np.random.randn(180, 2), columns=['AAA', 'BBB'], index=pd.date_range("2016-06-01", periods=180, freq='1T')) 

# how='mean' is the default, so this is the same as data.resample('60T') 
In [6]: data.resample('60T', how='mean') 
Out[6]: 
          AAA  BBB 
2016-06-01 00:00:00 0.100026 0.210722 
2016-06-01 01:00:00 0.093662 -0.078066 
2016-06-01 02:00:00 -0.114801 0.002615 

# calling .mean() now calculates the mean of each column, resulting in the following series: 
In [7]: data.resample('60T', how='mean').mean() 
Out[7]: 
AAA 0.026296 
BBB 0.045090 
dtype: float64 

In [8]: pd.__version__ 
Out[8]: u'0.17.1'

從開始0.18.0，resample本身是一個延遲操作，這意味着您首先要調用的方法（在這種情況下mean()）執行實際的重採樣：

In [4]: data.resample('60T') 
Out[4]: DatetimeIndexResampler [freq=<60 * Minutes>, axis=0, closed=left, label=left, convention=start, base=0] 

In [5]: data.resample('60T').mean() 
Out[5]: 
          AAA  BBB 
2016-06-01 00:00:00 -0.059038 0.102275 
2016-06-01 01:00:00 -0.141429 -0.021342 
2016-06-01 02:00:00 -0.073341 -0.150091 

In [6]: data.resample('60T').mean().mean() 
Out[6]: 
AAA -0.091270 
BBB -0.023052 
dtype: float64 

In [7]: pd.__version__ 
Out[7]: '0.18.1'

的變化的說明，請參見http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#resample-api我API。

來源

2016-06-09 11:51:29 joris

很好的解釋 – jezrael

謝謝！更新到最新的熊貓版本解決了這個問題。 –

熊貓重採樣不能正常工作

回答

相關問題