2017-10-06 64 views
0

如何使用最少數量的編碼線對所有可能的組羣進行多因素t檢驗。ddply多因子所有配對t檢驗

我的例子:
3X特徵:1,2,3
4X基團:A,B,C,d

目的:對於每一個特徵測試所有對組:

1 (AB,AC,AD,BC,BD,CD)
2(AB,AC,AD,BC,BD,CD)
3測試

此刻我使用ddply和裏面lapply:

library(plyr) 

groupVector <- c(rep("A",10),rep("B",10),rep("C",10),rep("D",10)) 
featureVector <- rep(1:3,each=40) 

mydata <- data.frame(feature=factorVector,group=groupVector,value=rnorm(120,0,1)) 

ddply(mydata,.(feature),function(x){ 
    grid <- combn(unique(x$group),2, simplify = FALSE) 
    df <- lapply(grid,function(p){ 
    sub <- subset(x,group %in% p) 
    pval <- t.test(sub$value ~ sub$group)$p.value 
    data.frame(groupA=p[1],groupB=p[2],pval=pval) 
    }) 
    res <- do.call("rbind",df) 
    return(res) 
}) 

回答

0

這裏是我拿,雖然這是值得商榷的不管是 '好'

split.data <- split(mydata, mydata$feature) 
pairs <- as.data.frame(matrix(combn(unique(mydata$group), 2), nrow=2)) 
library(tidyverse) 
map_df(split.data, function(x) map_df(pairs, function(y) tibble(groupA = y[1], groupB = y[2], 
             pval = t.test(value ~ group, data = x, subset = which(x$group %in% y))$p.value)), .id="feature") 

輸出

# # A tibble: 18 x 4 
    # feature groupA groupB  pval 
    # <chr> <chr> <chr>  <dbl> 
# 1  1  A  B 0.28452419 
# 2  1  A  C 0.65114472 
# 3  1  A  D 0.77746420 
# 4  1  B  C 0.42546791 
# 5  1  B  D 0.39876582 
# 6  1  C  D 0.88079645 
# 7  2  A  B 0.57843592 
# 8  2  A  C 0.30726571 
# 9  2  A  D 0.55457986 
# 10  2  B  C 0.74871464 
# 11  2  B  D 0.24017130 
# 12  2  C  D 0.04252878 
# 13  3  A  B 0.01355117 
# 14  3  A  C 0.08746756 
# 15  3  A  D 0.24527519 
# 16  3  B  C 0.15130684 
# 17  3  B  D 0.09172577 
# 18  3  C  D 0.64206517