2016-08-19 72 views
1

我想在我的數據分析中使用R.3.3.1中的BradleyTerry2包來包含特定於比賽的變量(我也嘗試使用R.2.11.1與舊版本進行比較BradleyTerry2)。我面臨的問題是我的預測變量沒有適當考慮。下面的例子顯示了我的問題,使用CEMS數據來說明我的觀點。BradleyTerry2包中預測變量的問題

CEMS.BTmodel_01 <- BTm(outcome = cbind(win1.adj, win2.adj), 
     player1 = school1, 
     player2 = school2, 
     formula = ~ .. + WOR[student] * LAT[..], 
     refcat = "Stockholm", 
     data = CEMS) 
    summary(CEMS.BTmodel_01) 

有了這個模型,我們得到一個AIC = 5837.4,估計到LAT的相互作用[..] * WOR [學生] = 0.85771

現在,如果我添加了一個新的學校(圖盧茲,LAT = 1)在列表頂部

Toulouse <- c(1,0,0,0,0,0,0) 
    Barcelona <- c(0,1,0,0,0,0,0) 
    London <- c(0,0,1,0,0,0,0) 
    Milano <- c(0,0,0,1,0,0,0) 
    Paris <- c(0,0,0,0,1,0,0) 
    St.Gallen <- c(0,0,0,0,0,1,0) 
    Stockholm <- c(0,0,0,0,0,0,1) 
    LAT <- c(1,1,0,1,1,0,0) 
    schools <- data.frame(Toulouse, Barcelona, London, Milano, Paris, St.Gallen, Stockholm, LAT) 
    rownames(schools) <- c("Toulouse", "Barcelona", "London", "Milano", "Paris", "St.Gallen", "Stockholm") 
    CEMS$schools <- schools 

我希望從分析得到同樣的結果,因爲新的學校沒有在數據集中出現。但我實際上得到了AIC = 5855.8,互動LAT []] WOR [學生] = 0.13199

玩弄數據,它看起來我的預測變量名稱(這裏學校的名稱)是沒有適當考慮並與我的比較數據(這裏是來自歐洲學生的配對比較)匹配。相反,這是他們的順序。

我做錯了什麼?

回答

0

CEMS$schools的各行應匹配school1school2因子的水平(的CEMS$schools的rownames不實際代碼中使用;在第一行應匹配的第一級等)。所以,你需要更新的school1school2水平:

CEMS$preferences <- 
within(CEMS$preferences, { 
    school1 <- factor(school1, rownames(CEMS$schools)) 
    school2 <- factor(school2, rownames(CEMS$schools)) 
    }) 

CEMS.BTmodel_02 <- BTm(outcome = cbind(win1.adj, win2.adj), 
        player1 = school1, 
        player2 = school2, 
        formula = ~ .. + WOR[student] * LAT[..], 
        refcat = "Stockholm", 
        data = CEMS) 

現在預期的模型是一樣的:

> CEMS.BTmodel_01 
Bradley Terry model fit by glm.fit 

Call: BTm(outcome = cbind(win1.adj, win2.adj), player1 = school1, player2 = school2, 
    formula = ~.. + WOR[student] * LAT[..], refcat = "Stockholm", 
    data = CEMS) 

Coefficients [contrasts: ..=contr.treatment ]: 
     ..Barcelona     ..London     ..Milano 
      0.5044     1.6037     0.3538 
      ..Paris    ..St.Gallen   WOR[student]yes 
      0.8741     0.5268      NA 
      LAT[..] WOR[student]yes:LAT[..] 
       NA     0.8577 
Degrees of Freedom: 4454 Total (i.e. Null); 4448 Residual 
    (91 observations deleted due to missingness) 
Null Deviance:  5499 
Residual Deviance: 4912  AIC: 5837 

> CEMS.BTmodel_02 
Bradley Terry model fit by glm.fit 

Call: BTm(outcome = cbind(win1.adj, win2.adj), player1 = school1, player2 = school2, 
    formula = ~.. + WOR[student] * LAT[..], refcat = "Stockholm", 
    data = CEMS) 

Coefficients [contrasts: ..=contr.treatment ]: 
     ..Toulouse    ..Barcelona     ..London 
       NA     0.5044     1.6037 
      ..Milano     ..Paris    ..St.Gallen 
      0.3538     0.8741     0.5268 
    WOR[student]yes     LAT[..] WOR[student]yes:LAT[..] 
       NA      NA     0.8577 
Degrees of Freedom: 4454 Total (i.e. Null); 4448 Residual 
    (91 observations deleted due to missingness) 
Null Deviance:  5499 
Residual Deviance: 4912  AIC: 5837 
+0

大,它工作得很好,現在的結果要好得多。 我也意識到,同樣的方法也必須應用於其他協變量矩陣(CEMS示例中的「學生」矩陣)。 –