2017-02-28 147 views
-1

我正在嘗試在sklearn中訓練RF模型進行分類。對於特定的特徵向量,我得到的測試準確度很低。我假設我選擇的特徵向量會誤導模型。所以我嘗試了RFE,RFECV等來找到一組相關的特徵向量 - 並沒有幫助提高準確性。我想出了一個簡單的功能選擇過程如下:>隨機森林:查找相關功能

ml_feats = #initial set of feature vector 

while True 
    feats_to_del=[] 
    prev_score=0 
    for feat_len in range(2,len(ml_feats)): 
     classifier = RandomForestClassifier(**init_params) 
     classifier.fit(X[ml_feats[:feat_len]],Y) 
     score = classifier.score(Xt[ml_feats[:feat_len]],Yt) 
     if score<prev_score: 
      #feature that caused the score to decrease 
      print ml_feats[feat_len] 
      feat_to_del.append(ml_feats[feat_len]) 
     prev_score=score 
    if len(feats_to_del)==0: 
     break 
    #delete irrelevant features 
    ml_feats=list(set(ml_feats)-set(feats_to_del)) 

print ml_feats #print all relevant features 

以上代碼是否有助於找出正確的功能集? 謝謝

回答

0

你在做什麼是一個貪婪的功能選擇。如果你想使用RandomForestClassifier來選擇功能,你可以這樣做:

from sklearn.ensemble import RandomForestClassifier 
from sklearn.feature_selection import SelectFromModel 
# xtrain : training data 
# ytrain : training labels 

clf = RandomForestClassifier() 
sfm = SelectFromModel(estimator=clf, threshold='mean') # threshold of selection is mean of feature importances by random forest classifier 
sfm.fit(xtrain, ytrain) 
selected_xtrain = sfm.transform(xtrain) 
+0

它會幫助刪除不相關的功能嗎? –

+0

是的。你爲什麼不嘗試呢? –

+0

我試過....沒有顯着的改善精度。 –