2016-04-22 65 views
1

如果問題有10個變量查找R平方爲多元迴歸

查找模式,通過使用完全相同 兩個變量爲您提供了最大的調整後的R平方值。

fit(i,j)=lm(y~xi+xj,data=data) 

其中xi和XJ可以X1,X2,...,X10

例如之間的任何給定的變量,我想比較調整後的R平方

fit(1,2)=lm(y~x1+x2,data=data) 

fit(1,3)=lm(y~x1+x3,data=data) 

之間。 。 。

fit(9,10)=lm(y~x9+x10,data=data) 

有沒有一種方法可以比較使用'for loop'命令的所有結果?

+1

嘗試[這](http://stackoverflow.com/questions/4951442/formula-with-dynamic-number-of-變量)或[this](http://stackoverflow.com/questions/13302323/looping-through-covariates-in-regression-using-r)。 – Laterow

+0

請刪除rstudio標籤。這與你的問題無關。 – lmo

回答

1

假設您的結果變量被調用outcome並且您的數據框df那麼我們首先可以定製一個函數來返回調整後的平方。之後,我們應用combn函數。請注意,爲此,您需要將結果(如果因子)轉換爲數字。 - df$outcome <- as.numeric(as.character(df$outcome))

R.squared <- function(y, x, z){ 
    summary(lm(y ~ x+z, df))$adj.r.squared 
} 
combn(ncol(df[,-1]), 2, function(i) R.squared(df$outcome, df[,i[1]], df[,i[2]])) 
#[1] 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 -0.97583296 -0.61915873 -1.31151020 -1.51437504 
#[14] -1.51135538 0.79397030 -1.21025638 -1.46657250 0.98277557 -0.53936636 -0.63855221 -0.02568424 0.78512289 0.71934837 -0.31817844 -0.14891020 0.68253538 
#[27] -1.05545863 0.85541926 0.67673403 -1.09460547 -1.70138478 0.75931881 0.98464144 -1.55739495 -0.05148017 -1.26050288 0.70467265 0.68822770 -1.24740025 
#[40] 0.99877169 -1.78165575 -1.21522704 0.77518005 0.98376700 -1.53121019 

正如你所看到的,我們得到45分的結果是正確的(10C2 = 45)。

DATA

dput(df) 
structure(list(outcome = structure(c(2L, 1L, 1L, 2L), .Label = c("0", 
"1"), class = "factor"), X1 = c(-0.086580111257948, 1.3225244296403, 
0.63970203781302, 1.17478656505647), X2 = c(0.116290308776141, 
-2.93084636363391, 0.67750806223535, 1.11777194347258), X3 = c(1.38404752146435, 
1.2839408555363, -0.976479813387477, 0.990836347961829), X4 = c(-1.53428156591653, 
-1.81700160188474, 0.35563308328848, 0.863904683601422), X5 = c(-0.0805126064587461, 
-0.962480324796481, 0.112310964386636, -0.257651852496691), X6 = c(1.48342629539586, 
0.677600299153581, -0.718621221409107, -0.547872283010696), X7 = c(1.52752065946695, 
-0.039941426401065, 0.384087275444754, 2.23916461213194), X8 = c(1.753974300534, 
1.22050988486485, 2.61512874217525, 1.76150083091101), X9 = c(-0.786009592713507, 
-0.176356977987529, 0.0947058204731415, 0.127134850846526), X10 = c(0.510517865869084, 
-1.24821415198133, 0.963011806720543, 0.307956641660821)), .Names = c("outcome", 
"X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10"), row.names = c(NA, 
-4L), class = "data.frame") 
0

你可以做

set.seed(42) 
data <- as.data.frame(matrix(rnorm(110), 10, 11)) 
names(data) <- c("y", paste0("x", 1:10)) 

fit.R2 <- function(i,j, dat) summary.lm(lm(as.formula(paste0("y ~ x", i, " + x", j)), data=dat))$adj.r.squared 

n <- 10 
i <- 1:(n-1) 
result <- data.frame(I=rep(i, n-i), J=unlist(sapply(2:n, ':', to=n))) 
result$R2 <- apply(result, 1, function(ij) fit.R2(ij["I"], ij["J"], data))