2017-08-05 95 views
0

我嘗試在特徵選擇中定義變量名稱。我有這樣如何在特徵選擇中定義變量名稱

import pandas as pd 
df = pd.DataFrame ({'a' : [1, 0,1, 0,1, 0,1, 0,1, 0 ], 
      'b' : ['foo', 'bar','foo', 'bar','foo', 'bar','foo', 'bar','foo', 'bar' ] , 
      'c' : ['foo', 'bar','bar','foo','foo', 'bar','bar','foo','foo', 'bar' ], 
       'd' :['d','d','b','a','d','d','a','b','d','a'] }) 

一個DataSet,以便

X, y = df.ix[:, 1:], df.ix[:,[0]] 
X_dummy = pd.get_dummies(X) 

而且

from sklearn.feature_selection import SelectKBest 
from sklearn.feature_selection import chi2 
X_new = SelectKBest(chi2, k=4).fit_transform(X_dummy, y) 
X_new 

array([[0, 1, 0, 1], 
     [1, 0, 0, 1], 
     [0, 1, 0, 0], 
     [1, 0, 1, 0], 
     [0, 1, 0, 1], 
     [1, 0, 0, 1], 
     [0, 1, 1, 0], 
     [1, 0, 0, 0], 
     [0, 1, 0, 1], 
     [1, 0, 1, 0]], dtype=uint8) 

我得到的數組,但我想知道什麼是變量(bcd或他們的虛擬期權)必須在模型中包含。如何找出這個?謝謝!

回答

1

可以使用選裝的scores_屬性

>> kbest = SelectKBest(chi2, k=4) 
>> X_new = kbest.fit_transform(X_dummy, y) 
>> X_dummy.columns[kbest.scores_.argsort()[::-1][:4]] 
Index(['b_foo', 'b_bar', 'd_a', 'd_d'], dtype='object')