2016-08-14 82 views
2

http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html(供參考)什麼是_passthrough_scorer以及如何更改GridsearchCV中的記分器(sklearn)?

x = [[2], [1], [3], [1] ... ] # about 1000 data 
grid = GridSearchCV(KernelDensity(), {'bandwidth': np.linspace(0.1, 1.0, 10)}, cv=10) 
grid.fit(x) 

當我使用GridSearchCV而不指定像評分函數,grid.scorer_的值。你能否解釋_passthrough_scorer是什麼類型的函數?

除此之外,我想將計分函數更改爲mean_squared_error或其他。

grid = GridSearchCV(KernelDensity(), {'bandwidth': np.linspace(0.1, 1.0, 10)}, cv=10, scoring='mean_squared_error') 

而行,grid.fit(X),總是給我此錯誤消息:

TypeError: __call__() missing 1 required positional argument: 'y_true' 

我無法弄清楚如何給y_true的功能,因爲我不知道真實分配。你能告訴我如何改變評分功能嗎?我感謝您的幫助。

回答

1

KernelDensity的默認度量是minkowski,其中p = 2這是一個歐幾里德度量。如果您未指定任何其他評分方法,則GridSearchCV將使用KernelDensity指標進行評分。

均方誤差公式爲:sum((y_true - y_estimated)^ 2)/ n。你得到了錯誤,因爲你需要有一個y_true來計算它。

這裏是施加到GridSearchCV KernelDensity的製造的例子:

from sklearn.neighbors import KernelDensity 
from sklearn.grid_search import GridSearchCV 
import numpy as np 

N = 20 
X = np.concatenate((np.random.randint(0, 10, 50), 
        np.random.randint(5, 10, 50)))[:, np.newaxis] 

params = {'bandwidth': np.logspace(-1.0, 1.0, 10)} 
grid = GridSearchCV(KernelDensity(), params) 
grid.fit(X) 
print(grid.grid_scores_) 
print('Best parameter: ',grid.best_params_) 
print('Best score: ',grid.best_score_) 
print('Best estimator: ',grid.best_estimator_) 

和輸出是:

[mean: -96.94890, std: 100.60046, params: {'bandwidth': 0.10000000000000001}, 


mean: -70.44643, std: 40.44537, params: {'bandwidth': 0.16681005372000587}, 
mean: -71.75293, std: 18.97729, params: {'bandwidth': 0.27825594022071243}, 
mean: -77.83446, std: 11.24102, params: {'bandwidth': 0.46415888336127786}, 
mean: -78.65182, std: 8.72507, params: {'bandwidth': 0.774263682681127}, 
mean: -79.78828, std: 6.98582, params: {'bandwidth': 1.2915496650148841}, 
mean: -81.65532, std: 4.77806, params: {'bandwidth': 2.1544346900318834}, 
mean: -86.27481, std: 2.71635, params: {'bandwidth': 3.5938136638046259}, 
mean: -95.86093, std: 1.84887, params: {'bandwidth': 5.9948425031894086}, 
mean: -109.52306, std: 1.71232, params: {'bandwidth': 10.0}] 
Best parameter: {'bandwidth': 0.16681005372000587} 
Best score: -70.4464315885 
Best estimator: KernelDensity(algorithm='auto', atol=0, bandwidth=0.16681005372000587, 
     breadth_first=True, kernel='gaussian', leaf_size=40, 
     metric='euclidean', metric_params=None, rtol=0) 

爲GridSeachCV有效評分方法通常需要y_true。在您的情況下,您可能希望將網格搜索將的度量標準更改爲其他度量標準(例如sklearn.metrics.pairwise.pairwise_kernels,sklearn.metrics.pairwise.pairwise_distances),以便將其用於評分。

+0

謝謝你的回答。根據KernelDensity的文檔(http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity),「密度輸出的標準化僅適用於歐幾里得距離度量「,但我不確定這是如何影響結果的。你能用簡單的英語解釋嗎? – Nickel

相關問題