2016-12-01 80 views
0

我有一個983 obs的表。 27個變量;可提供的數據,如果需要的話,但我不相信有必要爲它,如下面的交叉表應該總結的不夠好:創建boxplot時加入值

Kjønn Antall <> e f g s ug 
Sex  Count  w d m s um 
k  282  2 26 5 41  208 
m  701  11 56 4 148 2 480 

縮寫(有英文翻譯):

e[nkemann], f[raskilt], g[ift], s[eparert], ug[ift] 
w[idow(er)], d[ivorced], m[arried], s[eparated], u[n]m[arried] 

我想創建一個可變寬度的boxplot來顯示這些個體的分佈情況,但從表中可以看出,NAs,離婚和分居的會是這樣一個小團體,它幾乎不可讀(毫無意義。我怎樣才能加入這些羣體,創建一個展示ef+sgug

我當前的代碼:

# The basis for the boxplot 
dBox_SexAge <- ggplot(data = tblHoved) + 
    geom_boxplot(
    mapping = aes(colour = KJONN, x = KJONN, y = 1875-FAAR), 
    notch = TRUE, 
    lwd = .5, fatten = .125, 
    varwidth = TRUE 
) 

# Create the final boxplot 
dBox_SexAgeMStat <- dBox_SexAge + 
    facet_grid(SIVST ~ .) + 
    coord_flip() 

# Run it 
dBox_SexAgeMStat 

電流圖,從中我想組fsenter image description here

+0

[R替換數據框中的所有特定值]的可能重複(http://stackoverflow.com/questions/19503266/r-replace-all-particular-values-in-a-data-frame) –

回答

0

創建採樣數據幀

tblHoved <- data.frame(FAAR = rnorm(10), 
         SIVST = rep(c("e", "f", "g", "s", "ug"),2), 
         stringsAsFactors = FALSE) 
tblHoved 
#   FAAR SIVST 
# 1 0.22499630  e 
# 2 1.10236362  f 
# 3 0.02220001  g 
# 4 0.19062022  s 
# 5 0.05103136 ug 
# 6 0.09280887  e 
# 7 -0.70574835  f 
# 8 0.39331232  g 
# 9 0.24817094  s 
# 10 0.66631994 ug 

合併f和s

tblHoved$SIVST[tblHoved$SIVST %in% c("f","s")] <- "f+s" 
tblHoved 
#   FAAR SIVST 
# 1 0.22499630  e 
# 2 1.10236362 f+s 
# 3 0.02220001  g 
# 4 0.19062022 f+s 
# 5 0.05103136 ug 
# 6 0.09280887  e 
# 7 -0.70574835 f+s 
# 8 0.39331232  g 
# 9 0.24817094 f+s 
# 10 0.66631994 ug