2017-06-02 94 views
0

我想在因子水平上擬合模型,並使用那些擬合的模型名稱來預測這些匹配因子水平的新數據。我在這個邏輯中預測失敗,有人可以在下面的情況下引導這一點嗎?在r因子水平擬合和預測模型

Aa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1), cat1=sample(letters[21:24], 15,rep=TRUE),cat2=sample(letters[11:18], 5,rep=TRUE), 
        card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"), delay=sample(c(1,1,0,0,0),5,rep=TRUE)) 

ModelFit<-sapply(as.character(unique(Aa[["card"]])), function(x)glm(delay~amount+cat1+cat2, family = "binomial", data = subset(Aa, card==x)), simplify = FALSE, USE.NAMES = TRUE) 

Bb<-Aa[-(which(names(Aa) %in% "delay"))] 

sapply(unique(Aa[["card"]]), function(x,y) predict(seq_along(x=ModelFit), newdata=DataOPEN[DataOPEN$SubsidiaryName],type="response")) 
+0

你爲什麼不適合'延遲〜(量+ CAT1 + CAT2)* card',而不是循環? – Roland

回答

0

爲了簡單起見,我已經將它做成了一個循環。預測會發出警告,但似乎有效。您的DataOPEN數據集未提供,因此我只是使用原始Aa(新列pred)計算了預測。預測的最終舍入版本顯示在列pred.round中。

Aa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1), cat1=sample(letters[21:24], 15,rep=TRUE),cat2=sample(letters[11:18], 5,rep=TRUE), 
        card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"), delay=sample(c(1,1,0,0,0),5,rep=TRUE)) 

ModelFit <- sapply(as.character(unique(Aa[["card"]])), function(x)glm(delay~amount+cat1+cat2, family = "binomial", data = subset(Aa, card==x)), simplify = FALSE, USE.NAMES = TRUE) 

Aa$pred <- NaN # create a new variable for prediction 

for(i in levels(Aa$card)){ 
    newdat <- subset(Aa, subset=card==i) 
    newdat$pred <- predict(ModelFit[[i]], newdata=newdat,type="response") 
    Aa$pred[match(rownames(newdat), rownames(Aa))] <- newdat$pred 
} 

Aa$pred.round <- round(Aa$pred) # a rounded prediction 
Aa 

輸出:

> Aa 
    amount cat1 cat2 card delay   pred pred.round 
1  1 u p a  0 1.170226e-09   0 
2  2 x o b  1 1.000000e+00   1 
3  1 x o c  0 2.143345e-11   0 
4  2 w m a  0 1.170226e-09   0 
5  1 v n c  0 2.143345e-11   0 
6  1 x p b  0 5.826215e-11   0 
7  2 u o a  1 5.000000e-01   0 
8  2 x o c  0 2.143345e-11   0 
9  1 w m b  0 5.826215e-11   0 
10  1 w n a  0 1.170226e-09   0 
11  1 w p b  0 5.826215e-11   0 
12  2 w o c  1 1.000000e+00   1 
13  2 u o a  0 5.000000e-01   0 
14  2 u m c  0 2.143345e-11   0 
15  1 w n a  0 1.170226e-09   0 
+0

謝謝Marc!我花了一些時間來檢查我的概念,因爲我在newdataset(預測)的任何新級別的「cat2」變量中引入了「NA」,它對於小型工作正常。讓我檢查我的大數據集呢!乾杯! – corps