AUC的網格搜索查找參數

我試圖找到我的SVM的參數，它給了我最好的AUC。但是我無法在sklearn中找到AUC的任何得分功能。有人有想法嗎？這是我的代碼：AUC的網格搜索查找參數

parameters = {"C":[0.1, 1, 10, 100, 1000], "gamma":[0.1, 0.01, 0.001, 0.0001, 0.00001]} 
    clf = SVC(kernel = "rbf") 
    clf = GridSearchCV(clf, parameters, scoring = ???) 
    svr.fit(features_train , labels_train) 
    print svr.best_params_

那麼，我可以使用什麼？獲得高AUC評分的最佳參數？

來源

2016-06-07 spaethju

我還沒有嘗試過，但我相信你想使用sklearn.metrics.roc_auc_score。

問題是，它不是模型得分手，所以你需要建立一個。喜歡的東西：

from sklearn.metrics import roc_auc_score 

def score_auc(estimator, X, y): 
    y_score = estimator.predict_proba(X) # You could also use the binary predict, but probabilities should give you a more realistic score. 
    return roc_auc_score(y, y_score)

，利用此功能作爲GridSearch得分參數。

來源

2016-06-07 22:01:31 pekapa

感謝，我喜歡你想法，但如果我這樣做：'svr = GridSearchCV（svr，parameters，scoring = score_auc（svr，features_train，labels_train））'它會導致：AttributeError：predict_proba在probability = False時不可用。如果我將其設置爲true，則會顯示另一個錯誤。 – spaethju

只是做'svr = GridSearchCV（svr，parameters，scoring = score_auc）'，你不應該調用這個函數，只是把它傳遞給搜索。如果'predict_proba'給你帶來問題，只需要使用常規的'predict'就可以得分。 – pekapa

您可以簡單地使用：

clf = GridSearchCV(clf, parameters, scoring='roc_auc')

來源

2016-06-08 07:53:10 ncfirth

因此，如果我打印出svr.best_score_它的auc？因爲我試圖計算它是這樣的：'#ROC false_positive_rate，true_positive_rate，閾值= roc_curve（labels_test，labels_predicted） roc_auc = AUC（false_positive_rate，true_positive_rate）打印roc_auc'，但它讓我看到更低的AUC比最好成績 – spaethju

最佳分數對應於訓練過程中每個摺疊的最佳平均「roc_auc」。人們希望在測試集上看到較低的分數。 – ncfirth

你可以通過你自己做任何射手：

from sklearn.metrics import make_scorer 
from sklearn.metrics import roc_curve, auc 

# define scoring function 
def custom_auc(ground_truth, predictions): 
    # I need only one column of predictions["0" and "1"]. You can get an error here 
    # while trying to return both columns at once 
    fpr, tpr, _ = roc_curve(ground_truth, predictions[:, 1], pos_label=1)  
    return auc(fpr, tpr) 

# to be standart sklearn's scorer   
my_auc = make_scorer(custom_auc, greater_is_better=True, needs_proba=True) 

pipeline = Pipeline(
       [("transformer", TruncatedSVD(n_components=70)), 
       ("classifier", xgb.XGBClassifier(scale_pos_weight=1.0, learning_rate=0.1, 
           max_depth=5, n_estimators=50, min_child_weight=5))]) 

parameters_grid = {'transformer__n_components': [60, 40, 20] } 

grid_cv = GridSearchCV(pipeline, parameters_grid, scoring = my_auc, n_jobs=-1, 
                 cv = StratifiedShuffleSplit(n_splits=5,test_size=0.3,random_state = 0)) 
grid_cv.fit(X, y)

欲瞭解更多信息，請查看這裏：sklearn make_scorer

來源

2016-12-23 16:21:01

AUC的網格搜索查找參數

回答

相關問題