對不起，如果它感覺像一個重複的問題，但說實話，我已經花了超過12個小時，並且還沒有發現容易理解和易於應用的方法。如何完美地將（創建的）模型應用於R中的新數據？

情況很簡單，我創建了2個模型，並且需要將它們應用於測試數據。

#Model 1 - 

reg5 <- glm(train$survived ~ train$pclass_str + train$sex + 
      train$age_2 + train$sibsp + train$pclass_str*train$sex, 
      family = "binomial") 

#Model 2 - 
reg6 <- randomForest(train$survived_str ~ train$pclass_str + train$sex + 
         train$age_2 + train$sibsp, ntree=5000)

應用它 -

test$pred_reg5 <- predict(reg5, newdata = test, type="response") 
test$pred_reg6 <- predict(reg6, newdata = test, type="response")

什麼我可以保證的是，無論訓練和測試數據包含由同一個名字用在模型中的變量。儘管還有其他未使用的變量。

我得到的錯誤：

Error in `[<-.factor`(`*tmp*`, keep, value = c("0", "1", "1", "1", "0", : 
    NAs are not allowed in subscripted assignments 
In addition: Warning message: 
'newdata' had 418 rows but variables found have 891 rows

感謝您的幫助！

來源

2013-12-22 dsauce

改變你的模型，如：

reg5 <- glm(survived ~ pclass_str + sex + age_2 + sibsp + pclass_str*sex, 
      data=train, family = "binomial") 
reg6 <- randomForest(survived_str ~ pclass_str + sex + age_2 + sibsp, 
        data=train, ntree=5000)

有可能是您的型號規格的另一個問題，即reg5使用survived ~...和reg6使用survived_str ~...，但我無法從你的問題告訴我們，如果這是一個問題。

最後，正如@Roland指出的那樣，您可以簡化公式。如果您要做很多工作，請閱讀R中公式的文檔（?formula）。在R公式中，交互是通過指定a:b來構建的。符號a*b相當於a + b +a:b（例如，一階項+它們的相互作用）。因此，指定pclass_str*sex等同於指定pclass_str + sex + pclass_str:sex。

來源

2013-12-22 13:45:23 jlhoward

請在您的代碼中放入換行符。 – Roland

@Roland - 我只是在編輯OP的代碼。但是，好吧... – jlhoward

謝謝，如果你解釋爲什麼使用'data'參數會更好。第一個公式也可以簡化爲'age_2 + sibsp + pclass_str * sex'。 – Roland

如何完美地將（創建的）模型應用於R中的新數據？

應用它 -

回答

相關問題