2015-02-09 65 views
1

我有一個簡單的散點圖,顯示不同範圍之間年份之間的銷售差異。標籤散點圖上兩點之間的差異百分比差異

所以,當範圍爲「> $ 400」,銷售是X 2013和X在2014年

我想在某些點上顯示從2013年的百分比差值添加註釋至2014年是這樣可能?

這裏是dput:

structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L, 
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M", 
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M", 
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M", 
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M", 
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", 
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5, 
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5, 
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417, 
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143, 
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333, 
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379, 
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723, 
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286, 
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year", 
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA, 
-44L)) 

這裏是我目前生成的圖表:

orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91- 100M", "$101-110M", "$111-120M", "$121-130M", 
       "$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M", 
       "$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M") 

myDF = transform(myDF, Range = factor(Range, levels = orderlist)) 

myChart <- ggplot(myDF, aes(x = Range, y = Avg_TOTALS)) + 
      geom_point(aes(color = factor(Year))) + 
      theme_tufte() + 
      theme(axis.text.x= element_text(angle = 90, hjust = 0)) + 
      labs(x = "Range", y = "Sales by Range", title = "MyChart")+ 
      scale_y_continuous(breaks = c(50000000, 100000000, 200000000, 
             300000000,400000000, 500000000), 
           labels = dollar) 

這給了我:

MyChart

,並導致我這個問題:

我將如何添加每個點之間的百分比差異,以2013年爲基準年?此外,還有幾個銷售區域僅在兩年中有一個銷售區域 - 可以跳過這些百分比標籤嗎?兩年中必須存在哪些數據才能包括在內?

感謝您的幫助!

回答

1

這是一種方法。我認爲有更好的方法。這是我現在最困惑的腦子。希望你不介意。讓我簡單介紹一下代碼。我關注你了。然後,我獲得了ggplot正在使用的數據,我稱之爲foo。我創建了一個主數據框來處理丟失的數據點並使用連接。 dplyr部分正在進行一些計算和填充比例。在annotate中使用它的結果,我分配了你想要的標籤。希望這會幫助你。 ZZZ ...

DATA

mydf <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L), Range = structure(c(8L, 9L, 10L, 11L, 12L, 13L, 
14L, 16L, 17L, 18L, 19L, 20L, 21L, 23L, 24L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 26L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 
20L, 21L, 23L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 26L), .Label = c("$40M", 
"$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91-100M", "$101-110M", 
"$111-120M", "$121-130M", "$131-140M", "$141-150M", "$151-160M", 
"$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M", 
"$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", 
"$351-375M", "$376-400M", ">$400M"), class = "factor"), Avg_TOTALS = c(44732492.5, 
42902206, 47355762, 49604750.6666667, 51132411, 51943986, 54798652.5, 
61313778.5, 68577392, 74457422.6666667, 84805802.5, 96762417, 
99355792, 172956681, 189815908, 31762600.8571429, 33042576.2857143, 
34964083.8, 34349980.2, 35193407, 36049038.6666667, 42039793.3333333, 
486133671, 35996925, 35496337.5, 39139472.5, 36993568.5, 39570379, 
40139421.5, 43835119, 51358298.5, 53024160, 61185564, 67726723, 
71481251, 89873814, 27746650.1428571, 27633867, 29855703.5714286, 
29655265.2, 31163788.8, 29240507, 33810795.25, 192756973)), .Names = c("Year", 
"Range", "Avg_TOTALS"), class = "data.frame", row.names = c(NA, 
-44L)) 


orderlist = c("$40M", "$50M", "$60M", "$70M", "$71-80M", "$81-90M", "$91- 100M", "$101-110M", "$111-120M", "$121-130M", 
      "$131-140M", "$141-150M", "$151-160M", "$161-170M", "$171-180M", "$181-190M", "$191-200M", "$200-225M", 
      "$226-250M", "$251-275M", "$276-300M", "$301-325M", "$326-350M", "$351-375M", "$376-400M", ">$400M") 

mydf = transform(myDF, Range = factor(Range, levels = orderlist)) 

g <- ggplot(mydf, aes(x = Range, y = Avg_TOTALS)) + 
    geom_point(aes(color = factor(Year))) + 
    #theme_tufte() + 
    theme(axis.text.x= element_text(angle = 90, hjust = 0))+ 
    labs(x="Range", y = "Sales by Range", title = "MyChart")+ 
    scale_y_continuous(breaks = c(50000000, 100000000, 200000000, 300000000,400000000, 500000000), labels = dollar) 

library(dplyr) 

foo <- ggplot_build(g)$data[[1]] %>% 
     arrange(group) %>% 
     mutate(year = c(rep("2013", times = 23), rep("2014", times = 21))) 


master <- expand.grid(year = c("2013", "2014"), group = 1:24) 

full_join(master, foo, by = c("year", c("group" = "x"))) %>% 
group_by(group) %>% 
mutate(prop = round(order_by(year, y/first(y)), 2)) %>% 
summarise(y = first(y), prop = min(prop, na.rm = FALSE)) -> txt 

g + annotate("text", x = txt$group, y = txt$y + 15000000, label = txt$prop) 

enter image description here

+0

這是perfect-的感謝! – datahappy 2015-02-09 21:23:15

+0

@datahappy不客氣。 :) – jazzurro 2015-02-10 00:31:04