2015-06-23 92 views
2

我想在python中使用ARIMA建模對時間序列數據建模。我在默認數據序列上使用了函數statsmodels.tsa.stattools.arma_order_select_ic,並分別得到p和q的值爲2,2。代碼如下,時間序列數據的平穩性

dates=pd.date_range('2010-11-1','2011-01-30') 
dataseries=Series([22,624,634,774,726,752,38,534,722,678,750,690,686,26,708,606,632,632,632,584,28,576,474,536,512,464,436,24,448,408,528, 
      602,638,640,26,658,548,620,534,422,482,26,616,612,622,598,614,614,24,644,506,522,622,526,26,22,738,582,592,408,466,568, 
      44,680,652,598,642,714,562,38,778,796,742,460,610,42,38,732,650,670,618,574,42,22,610,456,22,630,408,390,24],index=dates) 
df=pd.DataFrame({'Consumption':dataseries}) 
df 

sm.tsa.arma_order_select_ic(df, max_ar=4, max_ma=2, ic='aic') 

結果是如下,

{'aic':    0   1   2 
0 1262.244974 1264.052640 1264.601342 
1 1264.098325 1261.705513 1265.604662 
2 1264.743786 1265.015529 1246.347400 
3 1265.427440 1266.378709 1266.430373 
4 1266.358895 1267.674168   NaN, 'aic_min_order': (2, 2)} 

但是當我使用Augumented迪基富勒測試,測試結果表明,該系列產品是不固定的。

d_order0=sm.tsa.adfuller(dataseries) 
print 'adf: ', d_order0[0] 
print 'p-value: ', d_order0[1] 
print'Critical values: ', d_order0[4] 

if d_order0[0]> d_order0[4]['5%']: 
    print 'Time Series is nonstationary' 
    print d 
else: 
    print 'Time Series is stationary' 
    print d 

輸出是如下,

adf: -1.96448506629 
p-value: 0.302358888762 
Critical values: {'5%': -2.8970475206326833, '1%': -3.5117123057187376, '10%': -2.5857126912469153} 
Time Series is nonstationary 
1 

當我交叉驗證,其中R的結果,它表明,該默認系列是靜止的。那麼爲什麼擴展的dickey更完整的測試結果是非平穩序列呢?

+0

adfuller不會拒絕存在單位根。這也可能意味着即使過程是平穩的,也沒有足夠的力量來拒絕單位根假設。你在R中使用了什麼「顯示」這個系列是靜止的? – user333700

+0

我剛剛在R中使用了auto.arima(y),它給了我結果(1,0,1),但是在python adfuller測試中將系列y描述爲非平穩。 –

回答

3

顯然你的數據有一定的季節性。然後需要小心完成arma模型和平穩性測試。

顯然,python和R之間的adf測試差異的原因是每個軟件使用的默認滯後數。

> (nobs=length(dataseries)) 
[1] 91 
> 12*(nobs/100)^(1/4) #python default 
[1] 11.72038 
> trunc((nobs-1)^(1/3)) #R default 
[1] 4 
> acf(coredata(dataseries),plot = F) 

Autocorrelations of series ‘coredata(dataseries)’, by lag 

    0  1  2  3  4  5  6  7  8  9  10  11 
1.000 0.039 -0.116 -0.124 -0.094 -0.148 0.083 0.645 -0.072 -0.135 -0.138 -0.146 
    12  13  14  15  16  17  18  19 
-0.185 0.066 0.502 -0.097 -0.151 -0.165 -0.195 -0.160 
> adf.test(dataseries,k=12) 

    Augmented Dickey-Fuller Test 

data: dataseries 
Dickey-Fuller = -2.6172, Lag order = 12, p-value = 0.322 
alternative hypothesis: stationary 

> adf.test(dataseries,k=4) 

    Augmented Dickey-Fuller Test 

data: dataseries 
Dickey-Fuller = -6.276, Lag order = 4, p-value = 0.01 
alternative hypothesis: stationary 

Warning message: 
In adf.test(dataseries, k = 4) : p-value smaller than printed p-value 
> adf.test(dataseries,k=7) 

    Augmented Dickey-Fuller Test 

data: dataseries 
Dickey-Fuller = -2.2571, Lag order = 7, p-value = 0.4703 
alternative hypothesis: stationary 
+0

但即使通過指定滯後順序= 12。該系列仍然是非平穩的,測試結果是,adf = -1.96448506629,p = 0.302358888762,這與您的答案的滯後12結果顯着不同。我應該在什麼基礎上選擇滯後訂單。 –

+0

還有另外一組數據,R和Python都將其顯示爲非平穩數據。如果在Python中這個系列的延遲被認爲是4。它變得靜止。究竟如何考慮滯後? –

+0

這是上面提到的系列[ 9560,4010,3790,3840,9150,10230,9570,8230,4640,3730,5820,10410,10220,10040,6720,4290,3820,8700,10040,10820,10080 ,4160,4320,4140, 9360,10000,10410,7830,9640,3950,5130,9420,9590,9070,10950,10320,3640,4260,10270,10380,9230,10750,10410,5160,5540, 11160,11000,11110, 9850,867,4830,5100,10680,11290,10930,10410,10380,4300,4270,10550,9170,13158,12407,10111,5997,5083,10577,10464,10592,11908 ,11150, 5867,5571,12262,11099,8584,8980,11391,6135,5638,11030,9080,12454,10899,10706,5259,5731,11392,9920,11640,11401 ] –