2014-12-05 24 views
2

數據:如何使用熊貓的rolling_std在其觀察中考慮兩列?

{'Open': {0: 159.18000000000001, 1: 157.99000000000001, 2: 157.66, 3: 157.53999999999999, 4: 155.03999999999999, 5: 155.47999999999999, 6: 155.44999999999999, 7: 155.93000000000001, 8: 155.0, 9: 157.72999999999999}, 
'Close': {0: 157.97999999999999, 1: 157.66, 2: 157.53999999999999, 3: 155.03999999999999, 4: 155.47999999999999, 5: 155.44999999999999, 6: 155.87, 7: 155.0, 8: 157.72999999999999, 9: 157.31}} 

代碼:

import pandas as pd 

d = #... data above. 
df = pd.DataFrame.from_dict(d) 
df['Close_Stdev'] = pd.rolling_std(df[['Close']],window=5) 

print df 

#  Close Open Close_Stdev 
# 0 157.98 159.18   NaN 
# 1 157.66 157.99   NaN 
# 2 157.54 157.66   NaN 
# 3 155.04 157.54   NaN 
# 4 155.48 155.04  1.369452 
# 5 155.45 155.48  1.259754 
# 6 155.87 155.45  0.975464 
# 7 155.00 155.93  0.358567 
# 8 157.73 155.00  1.065190 
# 9 157.31 157.73  1.189378 

問題:

上面的代碼沒有問題。但是,rolling_std是否有可能將其觀測窗口的因子分解爲Close的前四個值和Open的第五個值?基本上,我想rolling_std來計算其首發網下:

157.98 # From Close 
157.66 # From Close 
157.54 # From Close 
155.04 # From Close 
155.04 # Bzzt, from Open. 

從技術上講,這意味着觀察名單的最後的值總是最後Close值。

邏輯/原因:

顯然,這是股票數據。我試圖檢查在標準差的計算中考慮當前交易日的股票的Open價格是否更好,而不是僅僅檢查前面的Close s。

所需的結果:

#  Close Open Close_Stdev Desired_Stdev 
# 0 157.98 159.18   NaN   NaN 
# 1 157.66 157.99   NaN   NaN 
# 2 157.54 157.66   NaN   NaN 
# 3 155.04 157.54   NaN   NaN 
# 4 155.48 155.04  1.369452  1.480311 
# 5 155.45 155.48  1.259754  1.255149 
# 6 155.87 155.45  0.975464  0.994017 
# 7 155.00 155.93  0.358567  0.361151 
# 8 157.73 155.00  1.065190  0.368035 
# 9 157.31 157.73  1.189378  1.291464 

額外的細節:

這可以很容易地在Excel中通過使用式STDEV.S並且如在下面的截圖看出選擇號碼來完成。但是,我想要在Python和pandas中完成(出於個人原因)(我突出顯示F6,由於Snagit的影響,它不僅可見)。

enter image description here

回答

5

您可以使用Welford's method來計算標準偏差。 這樣做的好處是它可以在只有5次迭代的整個列上表示爲向量化算術。 這應該比逐行進行計算並且必須爲每一行編寫窗口的速度更快。

首先,這裏是一個全面的檢查,顯示維爾福德的方法,可以重現相同的結果

df['Close_Stdev'] = pd.rolling_std(df[['Close']],window=5) 

import numpy as np 
import pandas as pd 

class OnlineVariance(object): 
    """ 
    Welford's algorithm computes the sample variance incrementally. 
    """ 
    def __init__(self, iterable=None, ddof=1): 
     self.ddof, self.n, self.mean, self.M2 = ddof, 0, 0.0, 0.0 
     if iterable is not None: 
      for datum in iterable: 
       self.include(datum) 

    def include(self, datum): 
     self.n += 1 
     self.delta = datum - self.mean 
     self.mean += self.delta/self.n 
     self.M2 += self.delta * (datum - self.mean) 
     self.variance = self.M2/(self.n-self.ddof) 

    @property 
    def std(self): 
     return np.sqrt(self.variance) 


d = {'Open': {0: 159.18000000000001, 1: 157.99000000000001, 2: 157.66, 3: 
157.53999999999999, 4: 155.03999999999999, 5: 155.47999999999999, 6: 
155.44999999999999, 7: 155.93000000000001, 8: 155.0, 9: 157.72999999999999}, 
'Close': {0: 157.97999999999999, 1: 157.66, 2: 157.53999999999999, 3: 
155.03999999999999, 4: 155.47999999999999, 5: 155.44999999999999, 6: 155.87, 7: 
155.0, 8: 157.72999999999999, 9: 157.31}} 

df = pd.DataFrame.from_dict(d) 

df['Close_Stdev'] = pd.rolling_std(df[['Close']],window=5) 

ov = OnlineVariance() 
for n in range(5): 
    ov.include(df['Close'].shift(n)) 

df['std'] = ov.std 
print(df) 
assert np.isclose(df['Close_Stdev'], df['std'], equal_nan=True).all() 

產量

Close Open Close_Stdev  std 
0 157.98 159.18   NaN  NaN 
1 157.66 157.99   NaN  NaN 
2 157.54 157.66   NaN  NaN 
3 155.04 157.54   NaN  NaN 
4 155.48 155.04  1.369452 1.369452 
5 155.45 155.48  1.259754 1.259754 
6 155.87 155.45  0.975464 0.975464 
7 155.00 155.93  0.358567 0.358567 
8 157.73 155.00  1.065190 1.065190 
9 157.31 157.73  1.189378 1.189378 

那麼,以納入計算中的開放值,

ov = OnlineVariance() 
ov.include(df['Open']) 
for n in range(1, 5): 
    ov.include(df['Close'].shift(n)) 
df['std'] = ov.std 
print(df) 

產生

Close Open  std 
0 157.98 159.18  NaN 
1 157.66 157.99  NaN 
2 157.54 157.66  NaN 
3 155.04 157.54  NaN 
4 155.48 155.04 1.480311 
5 155.45 155.48 1.255149 
6 155.87 155.45 0.994017 
7 155.00 155.93 0.361151 
8 157.73 155.00 0.368035 
9 157.31 157.73 1.291464 
+0

哇。給我一個半小時來測試這一個。 *絕對*更優雅。 – Manhattan 2014-12-05 04:52:20

+0

太棒了。它工作得很好。我今天學到了一些東西。一個當之無愧的+1和接受! – Manhattan 2014-12-05 05:52:29

0

numpy發揮各地,直到我得到了我想要的東西。這是非常快,但它不是熊貓人並在這麼多的水平上可能不安全。我打開更漂亮答案比這一個。與此同時,這對我的事業足夠好。

import numpy 
... 

new_std = [] 
for i in range(df2.shape[0]+1): 
    print df2['Close'].iloc[i-5:i] 
    try: 
     close_ = np.array(df2['Close'].iloc[i-5:i]) 
     open_ = np.array(df2['Open'].iloc[i-5:i]) 
     # Change the close from last date in list to the open 
     # of that same date to simulate before-end-of-day trading. 
     close_[-1] = open_[-1] 
     new_std.append(np.std(close_, ddof=1)) 
    except: 
     new_std.append(np.NAN) 

df2['Desired_Stdev'] = new_std[1:] # Truncate to fit index. 
print df2 

#  Close Open Close_Stdev Desired_Stdev 
# 0 157.98 159.18   NaN   NaN 
# 1 157.66 157.99   NaN   NaN 
# 2 157.54 157.66   NaN   NaN 
# 3 155.04 157.54   NaN   NaN 
# 4 155.48 155.04  1.369452  1.480311 
# 5 155.45 155.48  1.259754  1.255149 
# 6 155.87 155.45  0.975464  0.994017 
# 7 155.00 155.93  0.358567  0.361151 
# 8 157.73 155.00  1.065190  0.368035 
# 9 157.31 157.73  1.189378  1.291464