2017-08-11 49 views
0

我想應用一些包含可靠性測量(例如ICC或變異係數)的統計計算。雖然我可以單獨計算它們,但我還不熟悉R函數編程實踐,不需要太多的代碼重複即可直接執行多個計算。執行多個統計計算的函數式編程原理

考慮以下data.frame例如包括五個不同的變量重複測量(T1, T2)(Var1, ... Var5):

set.seed(123) 
df = data.frame(matrix(rnorm(100), nrow=10)) 
names(df) <- c("T1.Var1", "T1.Var2", "T1.Var3", "T1.Var4", "T1.Var5", 
       "T2.Var1", "T2.Var2", "T2.Var3", "T2.Var4", "T2.Var5") 

如果我要計算每個變量,我可以的兩個重複測量之間的組內相關係數: 1)創建函數,返回:ICC,下界和上界的值:

calcula_ICC <- function(a, b) { 
    ICc <- ICC(matrix(c(a,b), ncol = 2)) 
    icc <- ICc$results[[2]] [3] 
    lo <- ICc$results[[7]] [3] 
    up <- ICc$results[[8]] [3] 
    round(c(icc, lo, up),2) 
} 

和2)其應用到每個對應的變量如下:

calcula_ICC(df$T1.Var1, df$T2.Var1) 
calcula_ICC(df$T1.Var2, df$T2.Var2) 
calcula_ICC(df$T1.Var3, df$T2.Var3) 
calcula_ICC(df$T1.Var4, df$T2.Var4) 
calcula_ICC(df$T1.Var5, df$T2.Var5) 

然後,我會對每個變量進行類似的其他統計計算,例如變異係數或重複測量之間的標準誤差。

但是,如何才能使用一些函數式編程原則呢?例如,我怎樣才能創建一個函數,將T1T2上的每個對應變量以及所需的函數作爲參數?

+0

看一看[掃帚(https://cran.r-project.org/web/packages/掃帚/ vignettes/broom.html) –

+1

如果您將數據轉換爲整齊的格式,則此問題將更容易解決:https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement -columns寬幅 - 進入 - 單柱 –

回答

1

函數式編程方法是使用mapply。沒有 「整理」 要求:

result = mapply(calcula_ICC, df[, 1:5], df[, 6:10], USE.NAMES=FALSE) 

colnames(result) = paste0('Var', 1:5) 

# Better than setting rownames here is to have calcula_ICC() return a named vector 
rownames(result) = c('icc','lo','up') 

> result 
#  Var1 Var2 Var3 Var4 Var5 
# icc 0.09 0.08 -0.37 -0.23 -0.17 
# lo -0.54 -0.55 -0.80 -0.73 -0.70 
# up 0.66 0.65 0.29 0.43 0.48 

(請注意,結果是一個矩陣。)

0

這裏有很多方法,我沒有時間將它們全部發布,但我可能會回來添加lapply解決方案,因爲apply函數在R中非常重要。

使用dplyrtidyr

這裏是一個dplyrtidyr解決方案,可以幫助:

require(dplyr) 
require(tidyr) 

# let's have a function for each value you want eventually 
GetICC <- function(x, y) { 
    require(psych) 
    ICC(matrix(c(x, y), ncol = 2))$results[[2]][3] 
} 

GetICCLo <- function(x, y) { 
    require(psych) 
    ICC(matrix(c(x, y), ncol = 2))$results[[7]][3] 
} 

    GetICCUp <- function(x, y) { 
     require(psych) 
    ICC(matrix(c(x, y), ncol = 2))$results[[8]][3] 
} 

# tidy up your data, take a look at what this looks like 
mydata <- df %>% 
    mutate(id = row_number()) %>% 
    gather(key = time, value = value, -id) %>% 
    separate(time, c("Time", "Var")) %>% 
    spread(key = Time, value = value) 

# group by variable, then run your functions 
# notice I added mean difference between the two 
# times as an example of how you can extend this 
# to include whatever summaries you need 
myresults <- mydata %>% 
    group_by(Var) %>% 
    summarize(icc = GetICC(T1, T2), 
      icc_lo = GetICCLo(T1, T2), 
      icc_up = GetICCUp(T1, T2), 
      mean_diff = mean(T2) - mean(T1)) 

這隻要工作以及EV你傳遞給總結的所有內容將彙總/計算在同一水平上。