2016-08-11 75 views
-1

有誰知道爲什麼下面的KNN R代碼對不同的種子給出不同的預測? 由於K < -5這很奇怪,因此大多數都是明確定義的。另外,在數據問題的精確度下,浮點數不會太小。 (注:我知道測試是從訓練古怪不同這只是創建證明奇怪KNN行爲的合成例子)問:KN中的R - 奇怪的行爲

library(class) 

train <- rbind(
    c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015), 
    c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861), 
    c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332), 
    c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033), 
    c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272), 
    c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095) 
) 
trainLabels <- c(1, 
       1, 
       0, 
       0, 
       1, 
       0) 
test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241) 

K <- 5 

set.seed(494139) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# **predicted: 1**, seed: 494139 

set.seed(5371) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# **predicted: 0**, seed: 5371 
+3

你的問題到底是什麼? R代碼中有一個錯誤:最後一個測試假設使用與第二個相同的種子,但它並不是因爲它沒有設置。這是你的困惑的根源嗎? – AlexR

回答

0

knn函數調用底層C function(線122)稱爲VR_knn,其中包括一個引入「模糊」或小值(epsilon,EPS)的步驟。看起來您的示例參數值可能會針對該「模糊」步驟。這方面的證據是,將您的值四捨五入得到一致性。因此:

rm(list=ls()) 

library(class) 
train <- rbind(
    c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015), 
    c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861), 
    c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332), 
    c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033), 
    c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272), 
    c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095) 
) 
trainLabels <- c(1,1,0,0,1,0) 
test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241) 
K <- 5 

train <- round(train,4) 

seed <- 494139 
set.seed(seed) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# predicted: 0, seed: 494139 

seed <- 5371 
set.seed(seed) 
pred <- knn(train=train, test=test, cl = trainLabels, k=K) 
message("predicted: ", pred, ", seed: ", seed) 
# predicted: 0, seed: 5371