我收到以下錯誤C5.0決策樹 - C50代碼調用出口值爲1
C50代碼調用出口值爲1
我上可用的泰坦尼克號數據這樣做從Kaggle
# Importing datasets
train <- read.csv("train.csv", sep=",")
# this is the structure
str(train)
輸出: -
'data.frame': 891 obs. of 12 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
$ Embarked : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...
然後我試圖使用C5.0 DTREE
# Trying with C5.0 decision tree
library(C50)
#C5.0 models require a factor outcome otherwise error
train$Survived <- factor(train$Survived)
new_model <- C5.0(train[-2],train$Survived)
所以運行上面的線給了我這個錯誤
c50 code called exit with value 1
我無法弄清是怎麼回事?我在不同的數據集上使用類似的代碼,它工作正常。有關如何調試我的代碼的任何想法?
-Thanks
感謝Marco。有效!! Cabin和Embarked列中的缺失值導致了這個問題。我觀察到的另一件事是,列車[-2]和列車[, - 2]具有相同的輸出...兩者之間是否有其他差異? – zephyr
你說得對,它似乎適用於data.frames。我總是使用train [, - 2],因爲對於矩陣train [-2]將把結果轉換成一個vector,並且只刪除一個元素。這是因爲概念矩陣就像向量一樣,您可以訪問它們的每個元素而不指定行/列 – Marco
糟糕。現在下一步是給出類似的代碼退出錯誤。我將test.csv讀入測試數據框。然後: - new_model_predict < - 對測試數據進行預測(new_model,test)。此外,我還在Cabin和Embarked測試數據列中分配了缺失標籤。 – zephyr