2017-06-16 93 views
0

與R中傳統循環有關的大多數問題通過使用代碼較少的函數來解釋,並且通常更靈活。For循環用於在R中按順序調整迴歸

然而,請糾正我,我覺得迭代次序很重要,因爲循環仍然占主導地位。

在我的情況下,我想建立一個順序和累積調整邏輯迴歸模型,存儲OR/CIs和一列顯示正在調整的內容。這是我的預期輸出:

Model  OR  CI 

Biomarker 
+Age 
+Sex 
+Smoking 

這裏就是我所做的:

df1 <- subset(df, select = c(age_cat, is_female, smoking_category, 
           bmi_calc, has_diabetes, sbp_mean, 
           alcohol_category, highest_education, 
           occupation, household_income)) 
model <- data.frame(NULL) 

for (i in seq_along(df1)) { 

    model <- exp((cbind(OR = coef(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")), 
         family=binomial, data=df)), 
      confint(glm(as.formula(paste("istroke ~ log2(hscrp_mgl)", i, sep = "+")), 
         family=binomial, data=df))))) 


} 

我的結果變量是中風(istroke,0或1)。我感興趣的暴露是生物標誌物(hscrp_mgl)。我知道我在某個地方犯了一個根本性的錯誤。我在其他SO帖子中尋找,但其中大多數不希望按順序累積調整迴歸模型。

請讓我知道如果這是重複的,但如果有什麼不清楚的。

編輯

我的原始數據集DF包含DF1的所有變量,我的結果變量,然後一些。下面是它的一個重複的樣品:

age_cat is_female smoking_category bmi_calc has_diabetes  sbp_mean istroke 
(59,69]  0   4   19.6   0    103.5   0 
(59,69]  1   1   19.1   0     138   0 
(29,59]  0   4   26.8   0    155.5   0 
(29,59]  0   1   23.1   0     130   1 
(29,59]  1   1   22.7   0     126   1 
(59,69]  0   4    25   0    182.5   0 
(29,59]  1   1    20   0     96   1 
(29,59]  1   2    23.9   0    134.5   0 
(59,69]  0   4    24.4   0    160.5   1 

編輯 更可重複的例子:

df <- data.frame(age = c(50, 60, 50, 40, 70, 90, 30), 
      gender = c(0, 1, 1, 0, 1, 1, 1), 
      smoke = c(4, 3, 2, 1, 4, 3, 4), 
      BMI = c(19, 20, 21, 22, 23, 24, 25), 
      SBP = c(100, 120, 140, 110, 120, 130, 120), 
      diab = c(0, 1, 1, 1, 0, 1, 1), 
      stroke = c(0, 1, 0, 0, 1, 1, 1)) 
dput(df) 
structure(list(age = c(50, 60, 50, 40, 70, 90, 30), gender = c(0, 
1, 1, 0, 1, 1, 1), smoke = c(4, 3, 2, 1, 4, 3, 4), BMI = c(19, 
20, 21, 22, 23, 24, 25), SBP = c(100, 120, 140, 110, 120, 130, 
120), diab = c(0, 1, 1, 1, 0, 1, 1), stroke = c(0, 1, 0, 0, 1, 
1, 1)), .Names = c("age", "gender", "smoke", "BMI", "SBP", "diab", 
"stroke"), row.names = c(NA, -7L), class = "data.frame") 
+0

請您提供DF的可重複的例子嗎? – OmaymaS

+0

@OmaymaS,請參閱編輯。 – Mak

+0

請問你是否想要它?只是爲了開始。 – OmaymaS

回答

0

其實,lapply可能是你的情況下,更好的方法了for,因爲它可以返回data.frames的集合,用於最終行綁定,而不是擴大模型反覆的。

以下示例隨機化hscrp_mgl因爲它不在發佈的數據中。所以忽略結果,但考慮過程。另外,置信區間在不同的列中分爲低和高。

set.seed(456) 
df <- data.frame(hscrp_mgl = abs(rnorm(250)), 
       age = sample(100, 1000, replace=TRUE), 
       gender = sample(0:1, 1000, replace=TRUE), 
       smoke = sample(1:4, 1000, replace=TRUE), 
       BMI = sample(19:25, 1000, replace=TRUE), 
       SBP = sample(c(100, 120, 140, 110, 120, 130, 120), 
           1000, replace=TRUE), 
       diab = sample(0:1, 1000, replace=TRUE), 
       stroke = sample(0:1, 1000, replace=TRUE)) 

# ITERATE THROUGH COLUMN NUMBERS (SUBSETTING OUT FIRST AND LAST) 
modeldfs <- lapply(seq_along(df)[3:ncol(df)-1], function(i) { 
    strf <- paste("stroke ~ log2(hscrp_mgl)", 
       paste(names(df)[2:i], collapse = "+"), sep = "+") 
    print(strf) 

    # FIT DYNAMIC CUMULATIVE FORMULA USING names() TO PASS IN COLUMN NAME 
    fit <- glm(as.formula(strf), family=binomial, data=df) 

    # BIND MODEL STATS 
    data.frame(OR = exp(coef(fit)[i+1]), 
      CI_2.5 = exp(confint(fit)[i+1,1]), 
      CI_97.5 = exp(confint(fit)[i+1,2])) 
}) 

model <- do.call(rbind, modeldfs) 
model 

輸出

[1] "stroke ~ log2(hscrp_mgl)+age" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
[1] "stroke ~ log2(hscrp_mgl)+age+gender+smoke+BMI+SBP+diab" 
# Waiting for profiling to be done... 
# Waiting for profiling to be done... 
# > model <- do.call(rbind, modeldfs) 
# > model 
      OR CI_2.5 CI_97.5 
age 1.003285 0.9989043 1.007701 
gender 1.067117 0.8318796 1.369055 
smoke 1.005926 0.9005196 1.123717 
BMI 1.011281 0.9505659 1.075928 
SBP 1.003252 0.9929368 1.013692 
diab 1.139586 0.8880643 1.462925 
+0

感謝@Parfait。不過,也許從帖子中不明確,我想**累計調整** OR和CI。因此,在第一次迭代中,它可能是stroke〜hscrp(粗糙模型),但下一次迭代應該給出stroke〜hscrp + age的ORs,然後是stroke〜hscrp + age + gender的ORs等等。因此,我的需要一個傳統的循環而不是函數,因爲順序迭代和累積迭代的順序在這裏很重要。 – Mak

+0

您實際上仍然可以在公式中的列名動態範圍上使用'paste(...,collapse)'使用'lapply'。請參閱編輯公式打印出來。 – Parfait

+0

非常感謝@Parfait!這看起來正確的錢。我將在週一回到我的部門時檢查這一點,並讓你知道它是怎麼回事! PS:它非常優雅! – Mak

0

我沒有與hscrp_mgl數據幀重現的結果,並確保它是與您想要的一樣,但您可以嘗試以下方法:

獲取您想要在迭代中使用的所有功能的名稱:

x <- setdiff(names(df), "stroke") 

使用purrr::map

創建與功能名稱的第一列中的數據幀,並使用purrr::map變異所需的值。

library(purrr) 

model <- data_frame(Model = x) %>% 
    mutate(OR = map(Model, ~coef(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
            family=binomial, data=df))), 
     CI = map(Model, ~confint(glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
            family=binomial, data=df))) 

你會得到某事像這樣:

# A tibble: 6 × 3 
    Model  OR   CI 
    <chr> <list>  <list> 
1 age <dbl [3]> <dbl [3 × 2]> 
2 gender <dbl [3]> <dbl [3 × 2]> 
3 smoke <dbl [3]> <dbl [3 × 2]> 
4 BMI <dbl [3]> <dbl [3 × 2]> 
5 SBP <dbl [3]> <dbl [3 × 2]> 
6 diab <dbl [3]> <dbl [3 × 2]> 

使用Purrr::mapbroom

您還可以使用broom函數提取從模型中所需的數據如下:

  • 添加模型結果爲一列
  • 使用tidy獲取係數並進行變異並添加OR
  • 獲取conf。使用confint_tidy和間隔添加CI

model2 <- data_frame(Model = x) %>% 
    mutate(model_details = map(Model, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
            family=binomial, data=df))) %>% 
    mutate(OR = map(model_details, broom::tidy), 
     CI = map(model_details, broom::confint_tidy)) 

累積調整

累積的調整,你可以嘗試以下方法:

model <- data_frame(Model = cnames) %>% 
    mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+"))) %>% 
    mutate(model_details = map(Model_adjust, ~glm(as.formula(paste("stroke ~ log2(hscrp_mgl)", .x, sep = "+")), 
             family=binomial, data=df))) %>% 
    mutate(OR = map(model_details, broom::tidy), 
     CI = map(model_details, broom::confint_tidy)) 

的額外步驟添加一列與包含的變量,然後f ollowing步驟使用Model_adjust以適應機型:

model <- data_frame(Model = cnames) %>% 
    mutate(Model_adjust = map2_chr(Model, seq_along(Model), ~paste(cnames[1:.y], collapse = "+"))) 

    # A tibble: 6 × 2 
     Model     Model_adjust 
     <chr>       <chr> 
    1 age       age 
    2 gender     age+gender 
    3 smoke    age+gender+smoke 
    4 BMI   age+gender+smoke+BMI 
    5 SBP  age+gender+smoke+BMI+SBP 
    6 diab age+gender+smoke+BMI+SBP+diab 
+0

感謝您的回覆@OmaymaS。這是否給了我個人關係的ORs,如中風〜hscrp +年齡,中風〜hscrp +性別?或者它是累積調整的變量,如中風〜hscrp +年齡,然後下一個中風〜hscrp +年齡+性別...等我希望後者... ORs和CI表格格式的序貫和累積adjustemtns該模型。 – Mak

+0

@Mak \t 檢查添加的累積調整部分 – OmaymaS

+0

謝謝@OmaymaS。雖然我發現掃帚套件非常有用,但我認爲帕菲特的方法更適合我的目的。 – Mak