2017-03-01 173 views
2

我正在使用ggplot2創建具有散點圖疊加的分組盒圖。我想將每個散點圖數據點與其對應的分組盒圖組合在一起。使用ggplot2在R中分組的散點圖在R中分組的散點圖

但是,我還希望scatterplot點是不同的符號。我似乎能夠將我的散點圖組與我的分組箱形圖組合起來,或者將我的散點圖分爲不同的符號......但不能同時出現。下面是一些示例代碼來說明發生了什麼:

library(scales) 
library(ggplot2) 

# Generates Data frame to plot 
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24)) 
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20)) 
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5)) 
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900), 
      rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000)) 
    value <- sqrt(value*value) 
     Tdata <- cbind(Gene, Clone, variable) 
     Tdata <- data.frame(Tdata) 
      Tdata <- cbind(Tdata,value) 

# Creates the Plot of All Data 
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape 
# and I'd like them to each have different shapes.       
ln_clr <- "black" 
bk_clr <- "white" 
point_shapes <- c(0,15,1,16,2,17) 
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4") 

lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) + 
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
       size = 0.7, coef = 4) + 
    geom_boxplot(coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
        alpha = 1, colour = ln_clr) + 
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
       pch=15) 


lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000), 
        labels = trans_format("log10", math_format(10^.x))) 

ggsave("Scatter Grouped-Wrong Symbols.png") 

#************************************************************************************************************************************* 
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes 
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) + 
    stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25, 
       size = 0.7, coef = 4) + 
    geom_boxplot(coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3, 
        alpha = 1, colour = ln_clr) + 
    geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7, 
       aes(shape=Clone)) 


lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000), 
        labels = trans_format("log10", math_format(10^.x))) 

ggsave("Scatter Ungrouped-Right Symbols.png") 

如果有人有任何建議,我會非常感激。

謝謝 彌敦道

回答

5

爲了得到箱線圖出現,shape審美需求在裏面geom_point,而不是在主調用ggplot。其原因是,當shape美學是在主ggplot調用,它適用於所有幾何,包括geom_boxplot。但是,應用shape=Clone美學原因會導致geom_boxplotClone的每個級別創建單獨的箱形圖。由於每個組合variableClone只有一行數據,因此不會生成箱形圖。

shape美學影響geom_boxplot對我來說似乎違反直覺,但也許有一個我不知道的原因。無論如何,將shape美學移到geom_point通過將shape審美僅應用於geom_point來解決該問題。

然後,要得到出現的點與正確的boxplot,我們需要groupGene。我還添加了theme_classic,使其更容易看到的情節(儘管它仍然是非常繁忙):

ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) + 
    stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) + 
    geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) + 
    geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7, 
      aes(shape=Clone, group=Gene)) + 
    scale_fill_manual(values=blue_cols) + labs(y="Fold Change") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand=c(0, 0), breaks=10^(-2:5), 
       labels=trans_format("log10", math_format(10^.x))) + 
    theme_classic() 

enter image description here

我覺得劇情會更容易理解,如果你使用小面的Gene和X軸爲variable。把時間放在X軸上看起來更直觀,而使用小平面則可以釋放點的顏色美感。對於六個不同的克隆來說,要區分點標記仍然很困難(至少對我來說),但這比我以前的版本更清晰。

library(dplyr) 

ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)), 
     aes(x=gsub("Day","",variable), y=value)) + 
    stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) + 
    geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) + 
    geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) + 
    theme_classic() + 
    facet_grid(. ~ Gene) + 
    labs(y = "Fold Change", x="Day") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand=c(0, 0), breaks=10^(-2:5), 
       labels=trans_format("log10", math_format(10^.x))) 

enter image description here

如果你真的需要保留的點,也許這將是更好的箱圖和點帶部分手動閃避分開:

set.seed(10) 
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)), 
         Gene = gsub("Gene","Gene ", Gene)), 
     aes(x=Day - 2, y=value, group=Day)) + 
    stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) + 
    geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) + 
    geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21, 
      position=position_jitter(width=1, height=0)) + 
    theme_classic() + 
    facet_grid(. ~ Gene) + 
    labs(y="Fold Change", x="Day") + 
    expand_limits(y=c(0.01,10^5)) + 
    scale_y_log10(expand=c(0, 0), breaks=10^(-2:5), 
       labels=trans_format("log10", math_format(10^.x))) 

enter image description here

一更多的事情:爲了將來的參考,你可以簡化你的數據創建代碼:

Gene = rep(paste0("Gene",LETTERS[1:5]), each=24) 
Clone = rep(paste0("D",1:6), 20) 
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5) 
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24), 
       sd=rep(c(0.5,8,900,9000,3000), each=24)) 

Tdata = data.frame(Gene, Clone, variable, value) 
+1

這也許是我見過的最好,最徹底,表達清晰的答案。非常感謝你的幫助。你寫的所有內容都非常有幫助。如果我能以某種方式給你更多的信貸,而不是最後的投票,我會這麼做。謝謝。 – Nathan

+0

謝謝,Nathan!謝謝您的好意。 – eipi10