2012-02-15 54 views
0

權係數這裏是我到目前爲止,但最後一部分是不正確的。摘要不是它應該的。不能得到我在與獲得正確總結我在R. 數據難度[R彙總函數

的目標是使用四邁爾斯布里格斯秤爲pi的預測,以適應模型=頻繁飲酒的概率。有人能指引我朝着正確的方向嗎?

> data(MBdrink) 
> MBdrink 
EI SN TF JP Drink Count 
1 E S T J Often 10 
2 E S T P Often  8 
3 E S F J Often  5 
4 E S F P Often  7 
5 E S T J Rarely 67 
6 E S T P Rarely 34 
7 E S F J Rarely 101 
8 E S F P Rarely 72 
9 E N T J Often  3 
10 E N T P Often  2 
11 E N F J Often  4 
12 E N F P Often 15 
13 E N T J Rarely 20 
14 E N T P Rarely 16 
15 E N F J Rarely 27 
16 E N F P Rarely 65 
17 I S T J Often 17 
18 I S T P Often  3 
19 I S F J Often  6 
20 I S F P Often  4 
21 I S T J Rarely 123 
22 I S T P Rarely 49 
23 I S F J Rarely 132 
24 I S F P Rarely 102 
25 I N T J Often  1 
26 I N T P Often  5 
27 I N F J Often  1 
28 I N F P Often  6 
29 I N T J Rarely 12 
30 I N T P Rarely 30 
31 I N F J Rarely 30 
32 I N F P Rarely 73 

> summary(MBdrink) 
EI  SN  TF  JP  Drink  Count  
E:16 S:16 T:16 J:16 Rarely:16 Min. : 1.00 
I:16 N:16 F:16 P:16 Often :16 1st Qu.: 5.00 
            Median : 15.50 
            Mean : 32.81 
            3rd Qu.: 53.00 
            Max. :132.00 





> MBdrink<-transform(MBdrink, EI=as.factor(EI)) 
> MBdrink<-transform(MBdrink, SN=as.factor(SN)) 
> MBdrink<-transform(MBdrink, TF=as.factor(TF)) 
> MBdrink<-transform(MBdrink, JP=as.factor(JP)) 

> levels(MBdrink$EI) 
[1] "E" "I" 
> levels(MBdrink$SN) 
[1] "S" "N" 
> levels(MBdrink$TF) 
[1] "T" "F" 
> levels(MBdrink$JP) 
[1] "J" "P" 

> MBdrink.fit<- 
+ glm((Count>0)~EI+SN+TF+JP+Drink,family=binomial,data=MBdrink) 
> summary(MBdrink.fit) 

Call: 
glm(formula = (Count > 0) ~ EI + SN + TF + JP + Drink, family = binomial, 
data = MBdrink) 

Deviance Residuals: 
    Min   1Q  Median   3Q  Max 
3.971e-06 3.971e-06 3.971e-06 3.971e-06 3.971e-06 

Coefficients: 
      Estimate Std. Error z value Pr(>|z|) 
(Intercept) 2.557e+01 9.353e+04  0  1 
EII   -4.602e-10 7.637e+04  0  1 
SNN   -4.602e-10 7.637e+04  0  1 
TFF   -4.602e-10 7.637e+04  0  1 
JPP   -4.602e-10 7.637e+04  0  1 
DrinkOften 4.602e-10 7.637e+04  0  1 

(Dispersion parameter for binomial family taken to be 1) 

Null deviance: 0.0000e+00 on 31 degrees of freedom 
Residual deviance: 5.0463e-10 on 26 degrees of freedom 
AIC: 12 

Number of Fisher Scoring iterations: 24 

謝謝!

回答

3

Count>0總是TRUE:您試圖預測中的常量變量,因此奇怪的結果。

對於迴歸,你所需要的原始數據,未聚合的數據。如果你想預測Drink列, 它不應該在預測。

# Sample data 
n <- 100 
MBdrink <- data.frame(
    EI=sample(c("E","I"), n, replace=TRUE), 
    SN=sample(c("S","N"), n, replace=TRUE), 
    TF=sample(c("T","F"), n, replace=TRUE), 
    JP=sample(c("J","P"), n, replace=TRUE), 
    Drink=factor(sample(c("Rarely","Often"), n, p=c(.2,.8), replace=TRUE), levels=c("Rarely", "Often")), 
    Count=rpois(n,5) 
) 
library(plyr) 
MBdrink <- ddply(MBdrink, c("EI","SN","TF","JP","Drink"), summarize, Count=sum(Count)) 
# dis-aggregate the data 
d <- ddply(MBdrink, "Count", function (u) 
    do.call(rbind, replicate(unique(u$Count), u, simplify=FALSE))) 
# Run the regression you want 
r <- glm( 
    Drink ~ EI + SN + TF + JP, 
    data=d, 
    family=binomial(link="logit") # Logistic regression 
) 
result <- cbind(d, Probability=predict(r, type="response")) 
result <- unique(result) 
result <- result[order(result$Probability),] 
result 
+2

我不明白你爲什麼說你可以使用聚合數據。總結(glm(cbind((Drink ==「Often」)* Count,(Drink!=「Often」)* Count)〜EI + SN + TF + JP,data = MBdrink,family = binomial()) 。 – 2012-02-16 01:17:56

+2

我通常避免使用匯總數據,因爲它常常會丟棄一些信息(這是不是這裏的情況)有一個更簡單的解決方案,使用權:'摘要(GLM(飲料〜EI + SN + TF + JP ,data = MBdrink,family = binomial(),weights = Count))'。 – 2012-02-16 01:29:25