OLS在Python中滾動迴歸錯誤 - IndexError：索引超出範圍

對於我的評估，我想針對在this link（https://drive.google.com/drive/folders/0B2Iv8dfU4fTUMVFyYTEtWXlzYkk）中找到的數據集運行滾動，例如3窗口OLS regression estimation，如以下格式所示。我的數據集中的第三列（Y）是我的真實值 - 這就是我想要預測的（估計）。OLS在Python中滾動迴歸錯誤 - IndexError：索引超出範圍

time  X Y 
0.000543 0 10 
0.000575 0 10 
0.041324 1 10 
0.041331 2 10 
0.041336 3 10 
0.04134 4 10 
    ... 
9.987735 55 239 
9.987739 56 239 
9.987744 57 239 
9.987749 58 239 
9.987938 59 239

使用簡單OLS regression estimation，我曾與下面的腳本試了一下。

# /usr/bin/python -tt 

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 

df = pd.read_csv('estimated_pred.csv') 

model = pd.stats.ols.MovingOLS(y=df.Y, x=df[['X']], 
           window_type='rolling', window=3, intercept=True) 
df['Y_hat'] = model.y_predict 

print(df['Y_hat']) 
print (model.summary) 
df.plot.scatter(x='X', y='Y', s=0.1)

然而，即使用statsmodels或scikit-learn似乎是超越了簡單的迴歸東西是不錯的選擇。我嘗試使用statsmodels來製作以下腳本，但使用attached數據集的更高子集（例如對於1000行以上的數據集）返回IndexError: index out of bounds。

# /usr/bin/python -tt 
import pandas as pd 
import numpy as np 
import statsmodels.api as sm 


df=pd.read_csv('estimated_pred.csv')  
df=df.dropna() # to drop nans in case there are any 
window = 3 
#print(df.index) # to print index 
df['a']=None #constant 
df['b1']=None #beta1 
df['b2']=None #beta2 
for i in range(window,len(df)): 
    temp=df.iloc[i-window:i,:] 
    RollOLS=sm.OLS(temp.loc[:,'Y'],sm.add_constant(temp.loc[:,['time','X']])).fit() 
    df.iloc[i,df.columns.get_loc('a')]=RollOLS.params[0] 
    df.iloc[i,df.columns.get_loc('b1')]=RollOLS.params[1] 
    df.iloc[i,df.columns.get_loc('b2')]=RollOLS.params[2] 

#The following line gives us predicted values in a row, given the PRIOR row's estimated parameters 
df['predicted']=df['a'].shift(1)+df['b1'].shift(1)*df['time']+df['b2'].shift(1)*df['X'] 

print(df['predicted']) 
#print(df['b2']) 

#print(RollOLS.predict(sm.add_constant(predict_x))) 

print(temp)

最後，我想要做的Y預測（即預測根據X前3個滾動值的Y當前值。我們怎樣才能做到這一點使用或者statsmodels或scikit-learn爲pd.stats.ols.MovingOLS在去除Pandas版本0.20.0，因爲我無法找到任何參考

來源

2017-07-03 Desta Haileselassie Hagos

你能報告錯誤的完整痕跡嗎？ – FLab

當然。以下是錯誤的完整描述。 'File'用於換行：'Traceback（最近一次調用最後一個）：文件「../Desktop/rolling_regression/rolling_regression2.py」，第26行，在中 df.iloc [i，df.columns.get_loc 'b2'）] = RollOLS.params [2] 文件「../anaconda/lib/python3.5/site-packages/pandas/indexes/base.py」，行1986，在get_value 返回tslib.get_value_box（ s，key） pandas.tslib.get_value_box（pandas/tslib.c：17017）中的文件「pandas/tslib.pyx」，第777行 pandas.tslib中的文件「pandas/tslib.pyx」，第793行。 get_value_box（pandas/tslib.c：16774） IndexError：index out of bounds' –

它看起來像sm.OLS調用成功。請檢查/顯示RollOls.params以確保它實際上有3個條目。 – FLab

我想我找到了你的問題：？從sm.add_constant的documentation，有一個名爲has_constant，你需要設置爲的說法（默認爲skip）。

has_constant : str {'raise', 'add', 'skip'} Behavior if ``data'' already has a constant. The default will return data without adding another constant. If 'raise', will raise an error if a constant is present. Using 'add' will duplicate the constant, if one is present. Has no effect for structured or recarrays. There is no checking for a constant in this case.

本質上是循環的該迭代您的變量time是在不斷的子集，因此功能沒有添加一個常數，因此RollOLS.params只有2項。

temp 
Out[12]: 
     time X  Y  a   b1   b2 
541 0.16182 13 20.0 19.49  3.15289 -1.26116e-05 
542 0.16182 14 20.0  20   0 7.10543e-15 
543 0.16182 15 20.0  20 -7.45058e-09   0 

sm.add_constant(temp.loc[:,['time','X']]) 
Out[13]: 
     time X 
541 0.16182 13 
542 0.16182 14 
543 0.16182 15 

sm.add_constant(temp.loc[:,['time','X']], has_constant = 'add') 
Out[14]: 
    const  time X 
541  1 0.16182 13 
542  1 0.16182 14 
543  1 0.16182 15

所以錯誤會消失，如果你在sm.add_constant功能有has_constant = 'add'，但你必須在解釋變量，這使得矩陣可逆沒有因此消退將沒有任何意義兩個線性從屬的列。

來源

2017-07-03 14:49:48 FLab

謝謝FLab。我仍然不明白爲什麼它只能用於100行的數據集，而沒有錯誤。你是什麼意思，但你會在解釋變量中有兩個線性相關的列，這使矩陣不可逆，因此迴歸沒有意義？我認爲'df ['a']'是我腳本中的常數。 –

我認爲在指數541-543是第一次當時間是3個觀測值不變。第二點，查看片段代碼的最後一個輸出。基本上你有time = 0.16182 * const，所以你矩陣的等級是2（不是3）。這個問題被稱爲多重共線性（在這種情況下是完美的）：https：//en.wikipedia.org/wiki/Multicollinearity – FLab

啊哈，完美，謝謝。當我們做'打印（臨時）'時，它只打印最後3個預測，我們打印所有的預測怎麼樣？ –

OLS在Python中滾動迴歸錯誤 - IndexError：索引超出範圍

回答

相關問題