1
以下是運行列車功能我的輸出:插入符包方法= 「treebag」
Bagged CART
1251 samples
30 predictors
2 classes: 'N', 'Y'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 1247, 1247, 1247, 1247, 1247, 1247, ...
Resampling results
Accuracy Kappa Accuracy SD Kappa SD
0.806 0.572 0.0129 0.0263
這是我的混淆矩陣
Bootstrapped (25 reps) Confusion Matrix
(entries are percentages of table totals)
Reference
Prediction N Y
N 24.8 7.9
Y 11.5 55.8
分割數據集後 - 80%列車和20%測試,我訓練模型,然後在測試分區上做一個「預測」,精度達到〜65%。
問題:
(1) Does this mean my model is not very good?
(2) Is 'treebag' the proper method since I only have 2 classes: 'N', 'Y' ? Would a Logistic Regression method be better?
(3) Finally, my 1251 samples are roughly 67% 'Y' and 33% 'N'. Could this be "skewing" my training/results? Do I need a ratio closer to 50 - 50?
任何幫助將不勝感激!