我試圖理解爲什麼我收到以下情況 - 我使用的虹膜數據,並做交叉驗證與ķ -nearest鄰分類選擇最佳ķ。GridSearchCV意外的平均結果
from sklearn.neighbors import KNeighborsClassifier
from sklearn import grid_search
from sklearn.cross_validation import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size=0.33, random_state=42)
parameters = {'n_neighbors': range(1,21)}
knn = sklearn.neighbors.KNeighborsClassifier()
clf = grid_search.GridSearchCV(knn, parameters,cv=10)
clf.fit(X_train, Y_train)
clf
對象有結果。
print clf.grid_scores_
[平均:0.94000,標準:0.08483,則params:{ 'N_NEIGHBORS':1},平均:0.93000,標準:0.08251,則params:{ 'N_NEIGHBORS':2},平均:0.94000, std:0.08456,params:{'n_neighbors':3},意思是:0.95000,std:0.08101,params:{'n_neighbors':4},意思是0.95000,std:0.08562,params:{'n_neighbors':5},平均值:0.93000,標準偏差:0.08284,參數:{'n_neighbors':6},平均值:0.95000,標準偏差:0.08512,參數:{'n_neighbors':7},平均值:0.94000,標準偏差:0.08414,params:{'n_neighbors' :8},平均值:0.94000,標準偏差:0.08414,參數:{'n_neighbors':9},平均值:0.94000,標準偏差:0.08414,參數:{'n_neighbors':10},平均值:0.94000,標準偏差:0.08483, {'n_neighbors':11},意思是:0.93000,std:0.08284,params:{'n_neighbors':12},意思是:0.93000,std:0.08284,params:{'n_n參數:{'n_neighbors':15},平均值:0.93000,標準偏差:0.08284,參數:{'n_neighbors':14} params:{'n_neighbors':16},意思是:0.94000,std:0.08483,params:{'n_neighbors':17},意思是:0.93000,std:0.09458,params:{'n_neighbors':18},意思是0.94000, STD:0.08483,則params:{ 'N_NEIGHBORS':19},平均:0.93000,標準:0.10887,則params:{ 'N_NEIGHBORS':20}]
然而,當我得到用於第一殼體10個CV結果k=1
print clf.grid_scores_[0].cv_validation_scores
我們得到
array([ 1. , 0.90909091, 1. , 0.72727273, 0.9 ,
1. , 1. , 1. , 1. , 0.88888889])
然而,這些10個觀察
print clf.grid_scores_[0].cv_validation_scores.mean()
的平均值爲0.942525252525,而不是呈現0.940000物體上。
所以,我很困惑,什麼意思是在做什麼,爲什麼它不一樣。我閱讀了文檔,但沒有發現任何可以幫助我的文檔。我錯過了什麼?