2017-05-09 69 views
0

前言:我是新望,但搜索了這裏小時,在pandas documentation沒有成功。我也讀過Wes的book大熊貓多指標滾動平均值

我建模股市數據,對衝基金,並有一個簡單的MultiIndexed非數據幀與行情,日期(每日),和領域。這裏的樣本來自彭博社。 3個月 - 2016年12月至2017年2月,3個代理商(AAPL,IBM,MSFT)。

import numpy as np 
import pandas as pd 
import os 

# get data from Excel 
curr_directory = os.getcwd() 
filename = 'Sample Data File.xlsx' 
filepath = os.path.join(curr_directory, filename) 
df = pd.read_excel(filepath, sheetname = 'Sheet1', index_col = [0,1], parse_cols = 'A:D') 

# sort 
df.sort_index(inplace=True) 

# sample of the data 
df.head(15) 
Out[4]: 
          PX_LAST PX_VOLUME 
Security Name date       
AAPL US Equity 2016-12-01 109.49 37086862 
       2016-12-02 109.90 26527997 
       2016-12-05 109.11 34324540 
       2016-12-06 109.95 26195462 
       2016-12-07 111.03 29998719 
       2016-12-08 112.12 27068316 
       2016-12-09 113.95 34402627 
       2016-12-12 113.30 26374377 
       2016-12-13 115.19 43733811 
       2016-12-14 115.19 34031834 
       2016-12-15 115.82 46524544 
       2016-12-16 115.97 44351134 
       2016-12-19 116.64 27779423 
       2016-12-20 116.95 21424965 
       2016-12-21 117.06 23783165 

df.tail(15) 
Out[5]: 
          PX_LAST PX_VOLUME 
Security Name date       
MSFT US Equity 2017-02-07 63.43 20277226 
       2017-02-08 63.34 18096358 
       2017-02-09 64.06 22644443 
       2017-02-10 64.00 18170729 
       2017-02-13 64.72 22920101 
       2017-02-14 64.57 23108426 
       2017-02-15 64.53 17005157 
       2017-02-16 64.52 20546345 
       2017-02-17 64.62 21248818 
       2017-02-21 64.49 20655869 
       2017-02-22 64.36 19292651 
       2017-02-23 64.62 20273128 
       2017-02-24 64.62 21796800 
       2017-02-27 64.23 15871507 
       2017-02-28 63.98 23239825 

當我計算價格每日變動,這樣,似乎工作,只有第一天是NaN,因爲它應該是:

df.head(5) 
Out[7]: 
          PX_LAST PX_VOLUME px_change_% 
Security Name date          
AAPL US Equity 2016-12-01 109.49 37086862   NaN 
       2016-12-02 109.90 26527997  0.003745 
       2016-12-05 109.11 34324540 -0.007188 
       2016-12-06 109.95 26195462  0.007699 
       2016-12-07 111.03 29998719  0.009823 

但每天的30天的成交量沒有。它應該只爲NaN的前29天,但爲NaN對於這一切:

# daily change from 30 day volume - doesn't work 
df['30_day_volume'] = df.groupby(level=0,group_keys=True)['PX_VOLUME'].rolling(window=30).mean() 
df['volume_change_%'] = (df['PX_VOLUME'] - df['30_day_volume'])/df['30_day_volume'] 

df.iloc[:,3:].tail(40) 
Out[12]: 
          30_day_volume volume_change_% 
Security Name date          
MSFT US Equity 2016-12-30   NaN    NaN 
       2017-01-03   NaN    NaN 
       2017-01-04   NaN    NaN 
       2017-01-05   NaN    NaN 
       2017-01-06   NaN    NaN 
       2017-01-09   NaN    NaN 
       2017-01-10   NaN    NaN 
       2017-01-11   NaN    NaN 
       2017-01-12   NaN    NaN 
       2017-01-13   NaN    NaN 
       2017-01-17   NaN    NaN 
       2017-01-18   NaN    NaN 
       2017-01-19   NaN    NaN 
       2017-01-20   NaN    NaN 
       2017-01-23   NaN    NaN 
       2017-01-24   NaN    NaN 
       2017-01-25   NaN    NaN 
       2017-01-26   NaN    NaN 
       2017-01-27   NaN    NaN 
       2017-01-30   NaN    NaN 
       2017-01-31   NaN    NaN 
       2017-02-01   NaN    NaN 
       2017-02-02   NaN    NaN 
       2017-02-03   NaN    NaN 
       2017-02-06   NaN    NaN 
       2017-02-07   NaN    NaN 
       2017-02-08   NaN    NaN 
       2017-02-09   NaN    NaN 
       2017-02-10   NaN    NaN 
       2017-02-13   NaN    NaN 
       2017-02-14   NaN    NaN 
       2017-02-15   NaN    NaN 
       2017-02-16   NaN    NaN 
       2017-02-17   NaN    NaN 
       2017-02-21   NaN    NaN 
       2017-02-22   NaN    NaN 
       2017-02-23   NaN    NaN 
       2017-02-24   NaN    NaN 
       2017-02-27   NaN    NaN 
       2017-02-28   NaN    NaN 

至於大熊貓似乎已經專門爲金融設計的,我很驚訝,這並不簡單。

編輯:我已經嘗試了一些其他方面也。

  • 試圖將其轉換爲一個小組(3D),但沒有發現任何功能內置於Windows除了要轉換爲數據幀和背部,所以沒有優勢在那裏。
  • 試圖創建一個數據透視表,但無法找到一個方法來引用只是多指標的第一級。 df.index.levels[0]...levels[1]不起作用。

謝謝!

回答

0

你可以試試下面來看看是否可行?

df['30_day_volume'] = df.groupby(level=0)['PX_VOLUME'].rolling(window=30).mean().values 

df['volume_change_%'] = (df['PX_VOLUME'] - df['30_day_volume'])/df['30_day_volume'] 
+0

工作,謝謝!我很好奇它的背後的解釋 - 爲什麼它有助於添加'.values'? –

+0

從[這裏]好像(http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.values.html#pandas.DataFrame.values),該'values'是一個屬性返回DataFrame的NumPy表示的DataFrame,以及Wes自己[說](https://stackoverflow.com/questions/10373660/converting-a-pandas-groupby-object-to-dataframe/10374456#10374456) GroupBy對象_is_本身就是一個DataFrame。 –

+0

groupby和rolling函數創建了具有重複索引鍵的多索引系列,這在分配給DF列時會導致問題。 .values屬性僅從可以毫無問題地分配給DF列的Series中取出值。 – Allen

0

使用pandas_datareader時,修改爲DataReader的multiindexing的GROUPBY操作的指數水平,我可以證實艾倫的回答作品。

import pandas_datareader.data as web 
import datetime 

start = datetime.datetime(2016, 12, 1) 
end = datetime.datetime(2017, 2, 28) 
data = web.DataReader(['AAPL', 'IBM', 'MSFT'], 'yahoo', start, end).to_frame() 

data['30_day_volume'] = data.groupby(level=1).rolling(window=30)['Volume'].mean().values 

data['volume_change_%'] = (data['Volume'] - data['30_day_volume'])/data['30_day_volume'] 

# double-check that it computed starting at 30 trading days. 
data.loc['2017-1-17':'2017-1-30'] 

樓主可以試試編輯這一行:

df['30_day_volume'] = df.groupby(level=0,group_keys=True)['PX_VOLUME'].rolling(window=30).mean() 

以下內容,使用平均()值:

df['30_day_volume'] = df.groupby(level=0,group_keys=True)['PX_VOLUME'].rolling(window=30).mean().values 

的數據沒有得到正確對齊無這個,導致NaN的。

+0

非常好,我很感激澄清。 –