R - 隨機森林 - 在測試數據上應用混淆矩陣的錯誤

我試圖在R中實現一個簡單的隨機森林算法，以瞭解R和隨機森林如何工作，並測試測試集中的準確性。R - 隨機森林 - 在測試數據上應用混淆矩陣的錯誤

我的樣本數據（561個總行的五行）是：

bulbasaur[1:5,] 
    Appt_date count no_of_reps PerReCount 
1 2016-01-01  2   1 2.000000 
2 2016-01-04 174   58 3.000000 
3 2016-01-05 206   59 3.491525 
4 2016-01-06 203   61 3.327869 
5 2016-01-07 236   64 3.687500

我寫的代碼是：

install.packages("caret") 
library(caret) 

leaf <- bulbasaur 
ctrl = trainControl(method="repeatedcv", number=100, repeats=50, selectionFunction = "oneSE") 
in_train = createDataPartition(leaf$PerReCount, p=.75, list=FALSE) 

#random forest 
trf = train(PerReCount ~ ., data=leaf, method="rf", metric="RMSE",trControl=ctrl, subset = in_train) 


#boosting 
tgbm = train(PerReCount ~ ., data=leaf, method="gbm", metric="RMSE", 
      trControl=ctrl, subset = in_train, verbose=FALSE) 

resampls = resamples(list(RF = trf, GBM = tgbm)) 
difValues = diff(resampls) 
summary(difValues) 



######Using it on test matrix 
test = leaf[-in_train,] 
test$pred.leaf.rf = predict(trf, test, "raw") 
confusionMatrix(test$pred.leaf.rf, test$PerReCount)

不過，我得到以下錯誤：

Error in confusionMatrix.default(test$pred.leaf.rf, test$PerReCount) : 
    the data cannot have more levels than the reference

我嘗試了一些更改，如採取leaf$PerReCount <- as.factors(leaf$PerReCount)，並添加type = "class"，但所得出的準確性很糟糕，我不想從迴歸到分類。我怎樣才能解決它，而不用轉換因素，或以任何其他方式解決問題，或者在不使用混淆矩陣的情況下獲得準確的計數。謝謝

來源

2017-09-10 Raj

混淆矩陣是指**分類符**，當您的目標變量是數字時沒有意義。現在，'PerReCount'變量顯然是一個連續的數字變量。您的問題不在代碼中，而是在瞭解您的數據。 –

@Damiano提出的問題是正確的，迴歸模型不會給出混淆矩陣，因爲它不是或不是。我解決的問題是使用RMSE：

piko.chu = predict(trf, test) 
RMSE.forest <- sqrt(mean((piko.chu-test$PerReCount)^2))

來源

2017-09-10 19:51:45 Raj

R - 隨機森林 - 在測試數據上應用混淆矩陣的錯誤

回答

相關問題