2015-09-05 58 views
3

使用python和scikit-learn,我想做一個網格搜索。但是我的一些模型最終變得空虛。如何讓網格搜索功能忽略這些模型?使sklearn中的網格搜索功能忽略空模型

我想我可以有一個評分函數,如果模型是空的,返回0,但我不知道如何。在某種程度上

predictor = sklearn.svm.LinearSVC(penalty='l1', dual=False, class_weight='auto') 
param_dist = {'C': pow(2.0, np.arange(-10, 11))} 
learner = sklearn.grid_search.GridSearchCV(estimator=predictor, 
              param_grid=param_dist, 
              n_jobs=self.n_jobs, cv=5, 
              verbose=0) 
learner.fit(X, y) 

我的數據的,這learner對象會選擇一個C對應一個空模型。任何想法如何確保模型不是空的?

編輯:由「空模型」我的意思是一個模型,選擇了0個要素使用。特別是用l1正則化模型,這很容易發生。因此,在這種情況下,如果SVM中的C足夠小,則優化問題將找到0向量作爲係數的最優解。因此predictor.coef_將是0 s的向量。

+2

什麼是正空的模式? – cel

+0

好問題。在編輯中解釋。 – adrin

+0

你爲什麼要明確地忽略這些模型?如果具有全零係數的模型最好,那麼你就知道有什麼不對。 –

回答

3

嘗試實現定製的射手,類似於:

import numpy as np 

def scorer_(estimator, X, y): 
    # Your criterion here 
    if np.allclose(estimator.coef_, np.zeros_like(estimator.coef_)): 
     return 0 
    else: 
     return estimator.score(X, y) 

learner = sklearn.grid_search.GridSearchCV(... 
              scoring=scorer_) 
+2

很好的使用記分儀界面! –

1

我不認爲有這樣的內置函數;這很容易,但是,做一個定製gridsearcher:

from sklearn.cross_validation import KFold                             
from sklearn.grid_search import GridSearchCV                             
from sklearn.cross_validation import cross_val_score                           
import itertools                                    
from sklearn import metrics                                 
import operator                                    


def model_eval(X, y, model, cv):                                
     scores = []                                   
     for train_idx, test_idx in cv:                              
       X_train, y_train = X[train_idx], y[train_idx]                         
       X_test, y_test = X[test_idx], y[test_idx]                          
       model.fit(X_train, y_train)                             
       nonzero_coefs = len(np.nonzero(model.coef_)[0]) #check for nonzero coefs                  
       if nonzero_coefs == 0: #if they're all zero, don't evaluate any further; move to next hyperparameter combo         
         return 0                                
       predictions = model.predict(X_test)                           
       score = metrics.accuracy_score(y_test, predictions)                       
       scores.append(score)                               
     return np.array(scores).mean()                              


X, y = make_classification(n_samples=1000,                             
          n_features=10,                              
          n_informative=3,                             
          n_redundant=0,                              
          n_repeated=0,                              
          n_classes=2,                              
          random_state=0,                             
          shuffle=False)                              


C = pow(2.0, np.arange(-20, 11))                                
penalty = {'l1', 'l2'}                                  

parameter_grid = itertools.product(C, penalty)                            

kf = KFold(X.shape[0], n_folds=5) #use the same folds to evaluate each hyperparameter combo                 

hyperparameter_scores = {}                                 
for C, penalty in parameter_grid:                                
     model = svm.LinearSVC(dual=False, C=C, penalty=penalty)                        
     result = model_eval(X, y, model, kf)                             
     hyperparameter_scores[(C, penalty)] = result                           

sorted_scores = sorted(hyperparameter_scores.items(), key=operator.itemgetter(1))                    

best_parameters, best_score = sorted_scores[-1]                            
print best_parameters                                   
print best_score