2017-06-26 59 views
1

這是一個簡單的問題,但我難以理解通過GGPLOT2要求的格式:[R GGPLOT2按百分比堆疊barplot幾個分類變量

我有以下的R中data.table

print(dt) 
    ID  category  A B C  totalABC                                                           
1: 10  group1  1 3 0  4                                                           
2: 11  group1  1 11 1  13                                                           
3: 12  group2  15 20 2  37                                                           
4: 13  group2  6 12 2  20                                                           
5: 14  group2  17 83 6  106 
... 

我的目標是創建一個正比堆積條形圖如在這個例子中:https://rpubs.com/escott8908/RGC_Ch3_Gar_Graphs

其中X/totalABC,其中X是任一category_type A,B或C.我也想PERFO的百分比rm按類別分類,例如X軸值應該是group1,group2等。

作爲具體例子,在group1的情況下,總共有4 + 13 = 17個元素。

的百分比是percent_A = 11.7%, percent_B = 82.3%, percent_C = 5.9%

正確的解決方案GGPLOT2似乎是:

library(ggplot2) 
pp = ggplot(dt, aes(x=category, y=percentage, fill=category_type)) +                                                        
      geom_bar(position="dodge", stat="identity") 

我的困惑:我怎麼會創建一個對應三個分類值單percentage列?

如果以上錯誤,我將如何格式化我的data.table以創建堆疊的barplot?

+0

使用'位置=「補」 '而不是'position =「閃避」 –

回答

1

這裏有一個解決方案:

require(data.table) 
require(ggplot2) 
require(dplyr) 
melt(dt,measure.vars = c("A","B","C"),variable.name = "groups",value.name = "nobs") %>% ggplot(aes(x=category,y=nobs,fill=groups))+geom_bar(stat = "identity",position="fill") 
1

您可以使用下面的代碼:

melt(data.frame(#melt to get each variable (i.e. A, B, C) in a single row 
    dt[,-1] %>% #get rid of ID 
      group_by(category) %>% #group by category 
        summarise_each(funs(sum))), #get the summation for each variable 
        id.vars=c("category", "totalABC")) %>% 
ggplot(aes(x=category,y=value/totalABC,fill=variable))+ #define the x and y 
     geom_bar(stat = "identity",position="fill") + #make the stacked bars 
       scale_y_continuous(labels = scales::percent) #change y axis to % format 

這將繪製:

                                                                    enter image description here

數據:

dt <- structure(list(ID = 10:14, category = structure(c(1L, 1L, 2L, 
    2L, 2L), .Label = c("group1", "group2"), class = "factor"), A = c(1L, 
    1L, 15L, 6L, 17L), B = c(3L, 11L, 20L, 12L, 83L), C = c(0L, 1L, 
    2L, 2L, 6L), totalABC = c(4L, 13L, 37L, 20L, 106L)), .Names = c("ID", 
    "category", "A", "B", "C", "totalABC"), row.names = c(NA, -5L 
    ), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000100788>) 

如果你想堅持你的繪圖代碼,該怎麼辦?

在這種情況下,你可以用它來獲得百分比:

df <- melt(data.frame(#melt to get each variable (i.e. A, B, C) in a single row 
     dt[,-1] %>% #get rid of ID 
      group_by(category) %>% #group by category 
      summarise_each(funs(sum))), #get the summation for each variable 
       id.vars=c("category", "totalABC")) %>% 
       mutate(percentage = dtf$value*100/dtf$totalABC) 

但需要修改ggplot正確地得到堆積條形圖:

#variable is the column carrying category_type 
#position dodge make the bars to be plotted next to each other 
#while fill makes the stacked bars 
ggplot(df, aes(x=category, y=percentage, fill=variable)) +   
     geom_bar(position="fill", stat="identity") 
+1

謝謝你的解釋性評論! – ShanZhengYang

+0

哦,我已經將data.table定義爲'dt',而不是'df'。爲了使未來的讀者保持一致 – ShanZhengYang

+0

在這種情況下,「fill = variable」是什麼意思? – ShanZhengYang