2017-10-10 173 views
0

我想創建一個具有多個列中的不同變量的R條形圖,全部在一個圖表中。我只能做一個2x2的情節用下面的代碼:具有多個列中的不同變量的R條形圖

barplot(table(y = cut$Gender,x = cut$Education)) 

即使如此,性別被堆放在教育的頂部。

Respondents Gender and Education level

圖表類型的我想是這樣的: enter image description here

我的樣本數據集:

structure(list(Gender = c("Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Female", "Male", "Male", "Male", "Male", 
"Female", "Male", "Female", "Male", "Male", "Male", "Male"), 
    Age = c("45-54 yrs", "35-44 yrs", "25-34 yrs", "25-34 yrs", 
    "25-34 yrs", "45-54 yrs", "25-34 yrs", "25-34 yrs", "25-34 yrs", 
    "35-44 yrs", "18-24 yrs", "25-34 yrs", "25-34 yrs", "55-64 yrs", 
    "35-44 yrs", "35-44 yrs", "35-44 yrs", "45-54 yrs", "35-44 yrs", 
    "45-54 yrs"), Employment = c("Civil servant", "Private sector", 
    "Private sector", "Private sector", "Trader", "Civil servant", 
    "Private sector", "Private sector", "Private sector", "Civil servant", 
    "Student", "Student", "Civil servant", "Retired", "Self-employed", 
    "Private sector", "Civil servant", "Civil servant", "Private sector", 
    "Private sector"), Marriage = c("Married", "Married", "Married", 
    "Married", "Single, never married", "Married", "Married", 
    "Married", "Married", "Married", "Single, never married", 
    "Single, never married", "Married", "Married", "Married", 
    "Married", "Married", "Married", "Married", "Married"), Education = c("Advanced degree", 
    "Advanced degree", "Bachelor's degree", "Bachelor's degree", 
    "Secondary education", "Advanced degree", "Bachelor's degree", 
    "Bachelor's degree", "Secondary education", "Secondary education", 
    "Secondary education", "Secondary education", "Advanced degree", 
    "Bachelor's degree", "Basic education", "Advanced degree", 
    "Advanced degree", "Advanced degree", "Advanced degree", 
    "Advanced degree"), Residence = c("Ashanti", "Ashanti", "Ashanti", 
    "Ashanti", "Ashanti", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", 
    "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", 
    "Central", "Central", "Eastern", "Greater Accra", "Greater Accra", 
    "Greater Accra", "Greater Accra", "Greater Accra"), Experience = c("Never", 
    "Never", "Never", "Never", "Never", "Never", "Never", "Never", 
    "Never", "Never", "Never", "Never", "Never", "Never", "Never", 
    "Never", "Never", "Never", "Never", "Never")), .Names = c("Gender", 
"Age", "Employment", "Marriage", "Education", "Residence", "Experience" 
), row.names = c(NA, 20L), class = "data.frame") 

回答

1

這裏有一個辦法:

首先轉換數據到長格式,這裏有兩個選項meltreshape pac kage或gather from tidyr。在這裏,我將使用tidyverse庫 加載許多有用的軟件包。

library(tidyverse) 

df %>% 
     gather(variable, value) 

然後做柱狀圖與GGPLOT2

ggplot()+ 
    geom_bar(aes(x = variable, fill = value), color = "black" , position = "stack", show.legend = FALSE) 

要添加文本註釋,我們做一個geom_text層,標籤的位置將由stat = "count"確定其計算對應於一個特殊的變量..count..因爲這是有點粗糙的情節,我們可以用它來調整vjust = 1

geom_text(stat = "count", aes(x = variable, label = value, 
           y = ..count.., 
           group = value), 
      position = "stack", vjust = 1) 

要在y軸上通常添加標籤百分比是y = (..count..)/sum(..count..),但總和(..算..)會在所有變量數的總和,因此最簡單的解決方法是手動標註不適合這裏

scale_y_continuous(labels = c("0%", "25%", "50%", "75%", "100%"), 
        breaks = c(0, 5, 10, 15, 20)) 

怎麼看起來都在一起:

library(tidyverse) 

df %>% 
    gather(variable, value) %>% 
    ggplot()+ 
    geom_bar(aes(x = variable, fill = value), 
      color = " black", 
      position = "stack", show.legend = FALSE)+ 
    geom_text(stat = "count", 
      aes(x = variable, 
       label = value, 
       y = ..count.., 
       group = value), 
      position = "stack", vjust = 1) + 
scale_y_continuous(labels = c("0%", "25%", "50%", "75%", "100%"), 
        breaks = c(0, 5, 10, 15, 20)) 

enter image description here

另一個選擇是y = ..count../sum(..count..)*7,因爲有7個變量

df %>% 
    gather(variable, value) %>% 
    ggplot()+ 
    geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+ 
    geom_text(stat = "count", aes(x = variable, label = value, y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+ 
    scale_y_continuous(labels = scales::percent)+ 
    ylab("") 

相同的輸出曲線

您可以在標籤中使用帶有GSUB和負前瞻

df %>% 
    gather(variable, value) %>% 
    mutate(label = gsub(" (?!yrs)", "\n", value, perl = T)) %>% 
    ggplot()+ 
    geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+ 
    geom_text(stat = "count", aes(x = variable, label = label, y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+ 
    scale_y_continuous(labels = scales::percent)+ 
    ylab("") 

enter image description here

+0

謝謝變異甚至添加一個條件換行符。有沒有辦法用百分比替換count_frequency(0到20)(如0-100%)。 – Masssly

+0

只需添加'scale_y_continuous(labels = c(「0%」,「25%」,「50%」,「75%」,「100%」),breaks = c(0,5,10,15,20) )'。還有其他方法,但我相信這在目前的例子中是最簡單的 – missuse

相關問題