2016-06-09 81 views
0

我從熊貓得到一個奇怪的行爲,我想重新採樣我的分鐘數據到小時數據(使用均值)。我的數據如下所示:熊貓重採樣不能正常工作

Data.head() 
         AAA BBB 
Time        
2009-02-10 09:31:00 86.34 101.00 
2009-02-10 09:36:00 86.57 100.50 
2009-02-10 09:38:00 86.58 99.78 
2009-02-10 09:40:00 86.63 99.75 
2009-02-10 09:41:00 86.52 99.66 

Data.info() 

<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 961276 entries, 2009-02-10 09:31:00 to 2016-02-29 19:59:00 
Data columns (total 2 columns): 
AAA 961276 non-null float64 
BBB 961276 non-null float64 
dtypes: float64(2) 
memory usage: 22.0 MB 

Data.index 

Out[25]: 
DatetimeIndex(['2009-02-10 09:31:00', '2009-02-10 09:36:00', 
       '2009-02-10 09:38:00', '2009-02-10 09:40:00', 
       '2009-02-10 09:41:00', '2009-02-10 09:44:00', 
       '2009-02-10 09:45:00', '2009-02-10 09:46:00', 
       '2009-02-10 09:47:00', '2009-02-10 09:48:00', 
       ... 
       '2016-02-29 19:41:00', '2016-02-29 19:42:00', 
       '2016-02-29 19:43:00', '2016-02-29 19:50:00', 
       '2016-02-29 19:52:00', '2016-02-29 19:53:00', 
       '2016-02-29 19:56:00', '2016-02-29 19:57:00', 
       '2016-02-29 19:58:00', '2016-02-29 19:59:00'], 
       dtype='datetime64[ns]', name='Time', length=961276, freq=None) 

要重新取樣我做數據如下:

tframe = '60T' 
hr_mean = Data.resample(tframe).mean() 

而作爲輸出我得到的熊貓系列只有兩個數字在它:

In[26]: hr_mean 
Out[26]: 
AAA 156.535198 
BBB  30.197029 
dtype: float64 

如果我選擇不同的時間範圍或重新採樣函數,我會得到相同的行爲。

+1

你可以給你使用的熊貓版本嗎? – joris

回答

2

您顯示的行爲是老熊貓版本的預期行爲(pandas < 0.18)。較新的熊貓版本具有改變的resample API,其中您在這裏看到一個棘手的情況。

v0.18之前,resample使用how關鍵字來指定如何重新取樣,並返回重新採樣幀/直接系列:

In [5]: data = pd.DataFrame(np.random.randn(180, 2), columns=['AAA', 'BBB'], index=pd.date_range("2016-06-01", periods=180, freq='1T')) 

# how='mean' is the default, so this is the same as data.resample('60T') 
In [6]: data.resample('60T', how='mean') 
Out[6]: 
          AAA  BBB 
2016-06-01 00:00:00 0.100026 0.210722 
2016-06-01 01:00:00 0.093662 -0.078066 
2016-06-01 02:00:00 -0.114801 0.002615 

# calling .mean() now calculates the mean of each column, resulting in the following series: 
In [7]: data.resample('60T', how='mean').mean() 
Out[7]: 
AAA 0.026296 
BBB 0.045090 
dtype: float64 

In [8]: pd.__version__ 
Out[8]: u'0.17.1' 

從開始0.18.0,resample本身是一個延遲操作,這意味着您首先要調用的方法(在這種情況下mean())執行實際的重採樣:

In [4]: data.resample('60T') 
Out[4]: DatetimeIndexResampler [freq=<60 * Minutes>, axis=0, closed=left, label=left, convention=start, base=0] 

In [5]: data.resample('60T').mean() 
Out[5]: 
          AAA  BBB 
2016-06-01 00:00:00 -0.059038 0.102275 
2016-06-01 01:00:00 -0.141429 -0.021342 
2016-06-01 02:00:00 -0.073341 -0.150091 

In [6]: data.resample('60T').mean().mean() 
Out[6]: 
AAA -0.091270 
BBB -0.023052 
dtype: float64 

In [7]: pd.__version__ 
Out[7]: '0.18.1' 

的變化的說明,請參見http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#resample-api我API。

+0

很好的解釋 – jezrael

+0

謝謝!更新到最新的熊貓版本解決了這個問題。 –