2016-11-23 175 views
0

以下是一個可重複的例子,什麼基本上我試圖做的,正在創作5點估算的數據集然後應用SVM使用中插入符號火車功能各估算數據集,然後合奏使用caretEnsemble產生訓練模型。最後,我使用整體模型預測每個測試集。caretEnsmble與SVM(問題),不同的培訓數據集

不過,我得到這個錯誤

Error in check_bestpreds_obs(modelLibrary) :
Observed values for each component model are not the same. Please re-train the models with the same Y variable

有反正周圍,可以幫助我合奏不同的培訓模式?

任何幫助是真正的讚賞。

library(mice) 
    library(e1071) 
    library(caret) 
    library("caretEnsemble") 

data <- iris 
#Generate 10% missing values at Random 
iris.mis <- prodNA(iris, noNA = 0.1) 
#remove categorical variables 
iris.mis <- subset(iris.mis, select = -c(Species)) 

# 5 Imputation using mice pmm 

imp <- mice(iris.mis, m=5, maxit = 10, method = 'pmm', seed = 500) 

# save 5 imputed dataset. 
x1 <- complete(imp, action = 1, include = FALSE) 
x2 <- complete(imp, action = 2, include = FALSE) 
x3 <- complete(imp, action = 3, include = FALSE) 
x4 <- complete(imp, action = 4, include = FALSE) 
x5 <- complete(imp, action = 5, include = FALSE) 

## Apply the following method for each imputed set 

form <- iris$Sepal.Width # target column 
n <- nrow(x1) # since all data sample are the same length 
prop <- n%/%fold 
set.seed(7) 
newseq <- rank(runif(n)) 
k <- as.factor((newseq - 1)%/%prop + 1) 
CVfolds <- 10 


CVrepeats <- 3 
    indexPreds <- createMultiFolds(x1[k != i,]$Sepal.Width, CVfolds, CVrepeats) 
    ctrl <- trainControl(method = "repeatedcv", repeats = CVrepeats,number = CVfolds, returnResamp = "all", savePredictions = "all", index = indexPreds) 




fit1 <- train(Sepal.Width ~., data = x1[k !=i, ],method='svmLinear2',trControl = ctrl) 
fit2 <- train(Sepal.Width ~., data = x2[k != i, ],method='svmLinear2',trControl = ctrl) 
fit3 <- train(Sepal.Width ~., data = x3[k != i, ],method='svmLinear2',trControl = ctrl) 
fit4 <- train(Sepal.Width ~., data = x4[k != i, ],method='svmLinear2',trControl = ctrl) 
fit5 <- train(Sepal.Width ~., data = x5[k != i, ],method='svmLinear2',trControl = ctrl) 




#combine the created model to a list 
     svm.fit <- list(svmLinear1 = fit1, svmLinear2 = fit2, svmLinear3 = fit3, svmLinear4 = fit4, svmLinear5 = fit5) 

    # convert the list to cartlist 
    class(svm.fit) <- "caretList" 

    #create the ensemble where the error occur. 
    svm.all <- caretEnsemble(svm.fit,method='svmLinear2') 
+0

我想你忘了在'形式在此處指定光圈< - Sepal.Width'。 –

+0

謝謝你發現這個,但我仍然得到同樣的錯誤。 – user3895291

回答

0

你必須簡化你的例子。獲取錯誤不需要太多移動部件和循環。其中一個內部caretEnsemble控件拋出此錯誤,但該消息沒有很好定義。

這就是說, caretList需要有一個指定的trainControl對象,您使用每個火車模型。否則,重採樣會爲每個模型不同,你會得到錯誤:

"Component models do not have the same re-sampling strategies"

下一個問題是,你正在使用不同的數據集,每個列車對象。 CaretEnsemble旨在與相同的訓練數據集一起使用。即使他們有相同的基礎,你的x1到x5也是不同的。這將導致錯誤:

"Observed values for each component model are not the same. Please re-train the models with the same Y variable"

最後,如果你想從單獨訓練的模型構建一個model.list只使用c(model1, model2)。看到文檔c.train

+0

非常感謝您的回覆,我真的很感激,我有簡化代碼的建議。您能否給我提供一個從單獨培訓的模型中構建model.list的示例?這是否意味着我不能用與不同訓練數據集關聯的模型構建插入符號集。 – user3895291