插入符包方法= 「treebag」

以下是運行列車功能我的輸出：插入符包方法= 「treebag」

Bagged CART 


1251 samples 
    30 predictors 
    2 classes: 'N', 'Y' 


No pre-processing 
Resampling: Bootstrapped (25 reps) 


Summary of sample sizes: 1247, 1247, 1247, 1247, 1247, 1247, ... 


Resampling results 


    Accuracy Kappa Accuracy SD Kappa SD 
    0.806  0.572 0.0129  0.0263

這是我的混淆矩陣

Bootstrapped (25 reps) Confusion Matrix 


(entries are percentages of table totals) 

      Reference 
Prediction N  Y 
     N 24.8 7.9 
     Y 11.5 55.8

分割數據集後 - 80％列車和20％測試，我訓練模型，然後在測試分區上做一個「預測」，精度達到〜65％。

問題：

(1) Does this mean my model is not very good? 
(2) Is 'treebag' the proper method since I only have 2 classes: 'N', 'Y' ? Would a Logistic Regression method be better? 
(3) Finally, my 1251 samples are roughly 67% 'Y' and 33% 'N'. Could this be "skewing" my training/results? Do I need a ratio closer to 50 - 50?

任何幫助將不勝感激！

來源

2014-11-06 dorkboy

代碼和一個可重複的例子在這裏會有所幫助。

假設混淆矩陣來自confusionMatrix.train，那麼我會說你的模型看起來不錯。準確度的差異有點令人費解。我已經看到測試集的結果看起來比定期的重採樣結果更差，但引導程序在測量性能方面可能非常悲觀，並且在這裏看起來比測試集好得多。嘗試使用不同的訓練/測試分組，並查看是否有類似的情況（或重複10次CV）。

（一）再次，很難與您發佈該說些什麼

（b）該模型是優秀的，沒有一般規則關於哪種模式更好或更壞（谷歌的「沒有免費的午餐」定理）

（c）該不平衡是不是太糟糕，所以我不認爲這是一個問題（除非訓練和測試集百分比不同）

最大

來源

2014-11-14 20:43:21 topepo

插入符包方法= 「treebag」

回答

相關問題