2017-09-14 325 views
-1

我有這個矩陣,並在R(控件vs案例)中進行Wilxcon測試,但我不知道如何正確放入矩陣。由於如何在R中執行多個wilcox.test?

gene.name cont1 cont2 cont3 case1 case2 case3 
A   10 2  3  21  18  8 
B   14 8  7  12  34  22 
C   16 9  19  21  2  8 
D   32 81  17  29  43  25 
.. 
+0

你有什麼試過的? [用於R的gCMAP軟件包及相關文檔](https://bioconductor.org/packages/release/bioc/html/gcmAP.html)可能會有所幫助。 – Argalatyr

+0

我之前做過,但是這樣做ttest但是我需要做Wilxcon測試,因爲我的數據是非參數的 – Behmah

回答

3

你可以試試:

# load your data 
d <- read.table(text="gene.name cont1 cont2 cont3 case1 case2 case3 
A   10 2  3  21  18  8 
B   14 8  7  12  34  22 
C   16 9  19  21  2  8 
B   32 81  17  29  43  25", header=T) 

library(tidyverse) 
# transform to long format using dplyr (included in tidyverse) 
dlong <- as.tbl(d) %>% 
    gather(key, value,-gene.name) %>% 
    mutate(group=ifelse(grepl("cont",key), "control", "case")) 
# plot the data 
dlong %>% 
    ggplot(aes(x=group, y=value)) + 
    geom_boxplot() 

enter image description here

# run the test 
dlong %>% 
    with(., wilcox.test(value ~ group)) 

Wilcoxon rank sum test with continuity correction 

data: value by group 
W = 94.5, p-value = 0.2034 
alternative hypothesis: true location shift is not equal to 0 

編輯

# as you don't clarified how to handle the double occurence of B I assume 
# thats a typo and fixed the second B to D 
library(ggpubr) 
dlong <- as.tbl(d) %>% 
    mutate(gene.name=LETTERS[1:4]) %>% 
    gather(key, value,-gene.name) %>% 
    mutate(group=ifelse(grepl("cont",key), "control", "case")) 

# plot the boxplot with Wilcoxen p-values using ggpubr 
dlong %>% 
    ggplot(aes(x=gene.name, y=value, fill=group)) + 
    geom_boxplot() + 
    stat_compare_means(method= "wilcox.test") 

enter image description here

# get the pvalues 
dlong %>% 
    group_by(gene.name) %>% 
    summarise(p=wilcox.test(value~group)$p.value) 
# A tibble: 4 x 2 
    gene.name  p 
     <chr> <dbl> 
1   A 0.2 
2   B 0.2 
3   C 0.7 
4   D 1.0 

或嘗試使用apply base R。

res <- apply(d[,-1], 1, function(x){ 
    wilcox.test(x ~ c(1,1,1,2,2,2))$p.value 
}) 
cbind.data.frame(Genes=as.character(d$gene.name), p=res, BH=p.adjust(res, method = "BH")) 
    Genes p  BH 
[1,]  1 0.2 0.4000000 
[2,]  2 0.2 0.4000000 
[3,]  3 0.7 0.9333333 
[4,]  2 1.0 1.0000000 
+0

謝謝Jimbou。但是我需要每個基因的P值基因列表(逐行)。或者獲得通過P值<0.05的基因列表。 – Behmah

+0

有人幫助請!我想要的只是一個簡單的U測試,但我不知道如何把我的矩陣? – Behmah

+0

@Behmah首先你必須解釋爲什麼B發生兩次,以及如何處理它?作爲獨立測量或重複測量還是作爲自己的組? – Jimbou