2016-03-04 47 views
2

我有這樣一個數據幀:熊貓和PanelOLS:只有2級多指標的支持

 year fcode  y  x 
0 1987 410032  NaN  0 
1 1988 410032  NaN  0 
2 1989 410032  NaN  0 
3 1987 410440  NaN  0 
4 1988 410440  NaN  0 
5 1989 410440  NaN  0 
6 1987 410495  NaN  0 
7 1988 410495  NaN  0 
8 1989 410495  NaN  0 
9 1987 410500  NaN  0 
10 1988 410500  NaN  0 
11 1989 410500  NaN  0 
12 1987 410501  NaN  0 
13 1988 410501  NaN  0 
14 1989 410501  NaN  0 
15 1987 410509  NaN  0 
16 1988 410509  NaN  0 
17 1989 410509  NaN  0 
18 1987 410513  NaN  0 
19 1988 410513  NaN  0 
20 1989 410513  NaN  0 
21 1987 410517  NaN  0 
22 1988 410517  NaN  0 
23 1989 410517  NaN  0 
24 1987 410518  NaN  0 
25 1988 410518  NaN  0 
26 1989 410518  NaN  0 
27 1987 410521  NaN  0 
28 1988 410521  NaN  0 
29 1989 410521  NaN  0 
.. ...  ...  ...  ... 
441 1987 419450  NaN  0 
442 1988 419450  NaN  0 
443 1989 419450  NaN  0 
444 1987 419459 0.512824  0 
445 1988 419459 0.916291  0 
446 1989 419459 0.113329  0 

我已經yearfcode分類:

df.sort_index(by=['year','fcode']) 

我DROP掉數據缺失:

df = df.dropna() # Drop missing 

我得到這個:

 year fcode   y  x 
30 1987 410523 -2.813411  0 
48 1987 410538 0.970779  0 
75 1987 410563 1.791759  0 
81 1987 410565 3.044523  0 
84 1987 410566 1.945910  0 
87 1987 410567 0.000000  0 
96 1987 410577 0.518794  0 
105 1987 410592 3.401197  0 
108 1987 410593 0.000000  0 
111 1987 410596 2.302585  0 
120 1987 410606 -0.415515  0 
129 1987 410626 -0.139262  0 
135 1987 410629 0.182322  0 
159 1987 410653 0.058269  0 
162 1987 410665 -2.995732  0 
171 1987 410685 -1.966113  0 
186 1987 418011 2.302585  0 
195 1987 418021 0.000000  0 
201 1987 418035 1.791759  0 
207 1987 418045 0.693147  0 
213 1987 418051 -0.798508  0 
219 1987 418054 0.223143  0 
222 1987 418065 0.262364  0 
228 1987 418076 0.058269  0 
231 1987 418083 1.098612  0 
237 1987 418091 2.101692  0 
240 1987 418097 0.512824  0 
246 1987 418107 -0.020203  0 
252 1987 418118 0.000000  0 
258 1987 418125 -0.798508  0 
...   ...  ... ... 
233 1989 418083 0.000000  0 
239 1989 418091 -0.579819  0 
242 1989 418097 0.350657  0 
248 1989 418107 -0.798508  0 
254 1989 418118 -2.302585  0 
260 1989 418125 -0.510826  0 
266 1989 418140 0.916291  0 
272 1989 418163 1.871802  0 
275 1989 418168 -1.609438  0 
278 1989 418177 2.890372  0 
299 1989 418237 -1.660731  0 
311 1989 419198 1.386294  0 
314 1989 419201 0.693147  0 
317 1989 419242 1.740466  0 
320 1989 419268 -0.105360  1 
323 1989 419272 2.833213  1 
332 1989 419289 -0.051293  1 
335 1989 419297 -1.309333  0 
350 1989 419307 -0.116534  1 
368 1989 419339 -0.798508  0 
371 1989 419343 1.098612  1 
383 1989 419357 -0.693147  1 
392 1989 419378 0.292670  1 
401 1989 419381 -0.967584  1 
407 1989 419388 1.791759  1 
422 1989 419409 0.693147  1 
431 1989 419432 1.648659  0 
446 1989 419459 0.113329  0 
464 1989 419482 1.029619  0 
467 1989 419483 3.401197  0 

我嘗試運行此:

model = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True) 

我得到這個錯誤:

raise NotImplementedError('Only 2-level MultiIndex are supported.') NotImplementedError: Only 2-level MultiIndex are supported.

我不知道我做錯了。你可以看到,看來我的代碼是類似於Fixed effects in Pandas

當我添加

df=df.set_index('year', append=True) 

我得到

Degrees of Freedom: model 161, resid 0 

    -----------------------Summary of Estimated Coefficients------------------------ 
      Variable  Coef Std Err  t-stat p-value CI 2.5% CI 97.5% 
    -------------------------------------------------------------------------------- 
      x  0.0000  nan  nan  nan  nan  nan 
+1

也許你將列添加到索引 - 'df = df.set_index('year',append = True)' - 結果是帶'multiindex'的df – jezrael

+0

謝謝!錯誤已消失,但我相信仍然存在問題,因爲我正在查找所有統計信息爲空的模型。請參閱上面的版本。 – DanielTheRocketMan

回答

1

你可以試試:

print df.head() 
    year fcode   y x 
30 1987 410523 -2.813411 0 
48 1987 410538 0.970779 0 
75 1987 410563 1.791759 0 
81 1987 410565 3.044523 0 
84 1987 410566 1.945910 0 

#convert year to datetime 
df['year'] = pd.to_datetime(df['year'], format='%Y') 
#add column year to index 
df=df.set_index('year', append=True) 
#swap indexes 
df.index = df.index.swaplevel(0,1) 
print df.head() 
       fcode   y x 
year        
1987-01-01 30 410523 -2.813411 0 
      48 410538 0.970779 0 
      75 410563 1.791759 0 
      81 410565 3.044523 0 
      84 410566 1.945910 0 

model = pd.stats.plm.PanelOLS(y=df['y'],x=df[['x']],time_effects=True) 
print model 
-------------------------Summary of Regression Analysis------------------------- 

Formula: Y ~ <x> 

Number of Observations:   60 
Number of Degrees of Freedom: 3 

R-squared:   0.0013 
Adj R-squared: -0.0338 

Rmse:    1.4727 

F-stat (1, 57):  0.0364, p-value:  0.8493 

Degrees of Freedom: model 2, resid 57 

-----------------------Summary of Estimated Coefficients------------------------ 
     Variable  Coef Std Err  t-stat p-value CI 2.5% CI 97.5% 
-------------------------------------------------------------------------------- 
      x  0.1539  0.5704  0.27  0.7882 -0.9640  1.2719 
---------------------------------End of Summary--------------------------------- 
+0

謝謝!這是什麼意思df.index = df.index.swaplevel(0,1)? – DanielTheRocketMan

+0

對我來說,自由度達到60-3是很奇怪的。在典型的固定效果面板數據模型中,它轉到N(T-1)-K。 – DanielTheRocketMan

+1

'swaplevel' - 它將第一級多指標與第二級指標交換,因爲您需要第一級多指標「year」。 [文檔](http://pandas.pydata.org/pandas-docs/stable/advanced.html#swapping-levels-with-swaplevel) – jezrael