2015-10-14 95 views
0

我試圖找出相關的解釋變量並消除。我使用Sapply將回歸應用於我感興趣的變量,並手動刪除FIV> 10的變量。但是,當我嘗試重現此操作以快速運行多個vif時,我無法設法獲取我的迴歸腳本使用包含我想保留的名稱的粘貼的公式對象運行。下圖:R:在Sapply中使用粘貼公式

regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100))) 
colnames(regressiondata) <- c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5") 
vifs1_model <- sapply(regressiondata[,indep_variables],function(x) vif(lm(x~var1+var2+var3+var4+var5, 
                     data = regressiondata, 
                     na.action=na.exclude))) 
vifs1 <- rowMeans(vifs1_model) 
formula_variables <- paste(names(vifs1),collapse="+") 
final_model <- t(round(sapply(regressiondata[,indep_variables], 
      function(x) lm(x ~ formula_variables,data=regressiondata,na.action=na.exclude)$coef),2)) 
我跑的時候

「final_model」 我得到這個錯誤:

錯誤噸(圓(sapply(regressiondata [,indep_variables],函數(X)LM(X〜: 錯誤在選擇函數't'的方法時評估參數'x':model.frame.default中的錯誤(公式= x〜formula_variables,data = regressiondata,: 可變長度不同(找到'formula_variables')

回答

1

我認爲你有幾個問題:

  1. 時,它看起來像你只是想sapply以上的自變量名的載體
  2. 以流明你最後的嵌套調用似乎

這裏是混合表達式和字符串您正在使用sapply在數據幀我走過去。您的代碼是指讓我在一些線路已經增加了一些缺失的對象我想你離開了

library(car) # for fiv() 
regressiondata <- data.frame(matrix(ncol=9,nrow=100,runif(900,1,100))) 
colnames(regressiondata) <- c("indep1", 
           "indep2", 
           "indep3", 
           "indep4", 
           "var1", 
           "var2", 
           "var3", 
           "var4", 
           "var5") 

indep_variables <- names(regressiondata)[1:4] # object did not exist 

我爆發匿名函數爲清楚:

f1 <- function(x) { 
    vif(lm(x~var1+var2+var3+var4+var5, 
     data = regressiondata, 
     na.action=na.exclude)) 
} 

現在你的迴歸

vifs1_model <- sapply(regressiondata[,indep_variables], f1) 
vifs1 <- rowMeans(vifs1_model) 
formula_variables <- paste(names(vifs1),collapse="+") 

我把這個函數命名爲拉係數,並用整個公式遞給一個字符向量(字符串):

getCoefs <- function(x) { 
    lm(paste(x, "~", formula_variables), data=regressiondata, 
    na.action=na.exclude)$coef 
} 

現在,只需在sapply名的載體,然後轉和輪:

final_model <- sapply(indep_variables, getCoefs) 
final_model <- t(round(final_model ,2)) 
0

這裏是一個做事的方式dplyr。大部分工作由sub_regression函數完成,sub_regression函數執行迴歸,通過vif過濾獨立變量,然後重做迴歸

library(dplyr) 
library(tidyr) 
library(magrittr) 
library(car) 

sub_regression = function(sub_data_frame) 
    lm(independent_value ~ var1+var2+var3+var4+var5, 
    data = sub_data_frame , 
    na.action="na.exclude") %>% 
    vif %>% 
    Filter(function(x) x <= 10, .) %>% 
    names %>% 
    paste(collapse = " + ") %>% 
    paste("independent_value ~ ", .) %>% 
    as.formula %>% 
    lm(. , sub_data_frame, na.action="na.exclude") %>% 
    coefficients %>% 
    round(3) %>% 
    as.list %>% 
    data.frame(check.names = FALSE) 

matrix(ncol=9,nrow=100,runif(900,1,100)) %>% 
    data.frame %>% 
    setNames(c("indep1","indep2","indep3","indep4","var1","var2","var3","var4","var5")) %>% 
    gather(independent_variable, independent_value, 
     indep1, indep2, indep3, indep4) %>% 
    group_by(independent_variable) %>% 
    do(sub_regression(.))