2017-09-17 27 views
3

我有一組帶有類別標籤的2D點,並且想要顯示哪個類別支配超過2D平面的網格的每個單元格。顯示網格區域的模態值的熱圖方式圖(通過stat_summary_2d?)

我想我可以使用stat_summary_2d來選擇最常見的值,如下圖所示,但是我得到了三種不同的圖表,除了圖例標籤外,它們應該是相同的。

我濫用我stat_summary_2d?有沒有更好的方法來產生這種情節?

library(ggplot2) 
set.seed(12345) 
x = runif(1000) 
y = runif(1000) 
lab = rep(c("red", "blue", "green", "yellow"), 250) 

df = data.frame(x=x, y=y, lab=factor(lab, labels=c("red", "blue", "green", "yellow"))) 
df$val = as.numeric(df$lab) 

#Attempt 1 
ggplot(df, aes(x=x, y=y)) + 
    stat_summary_2d(aes(z=lab), 
        fun=function(z) names(which.max(table(z))), 
        binwidth=.1) 

#Attempt 2 
ggplot(df, aes(x=x, y=y)) + 
    stat_summary_2d(aes(z=val), 
        fun=function(z) names(which.max(table(z))), 
        binwidth=.1) 

#Attempt 3 
ggplot(df, aes(x=x, y=y)) + 
    stat_summary_2d(aes(z=as.numeric(lab)), 
         fun=function(z) names(which.max(table(z))), 
         binwidth=.1) 

回答

1

添加group = 1嘗試1 &你會看到面板相同的分佈爲隨後的兩次嘗試。

指定填充調色板適當,&所有三個外觀都相同:

library(ggplot2) 

#Attempt 1 
p1 <- ggplot(df, aes(x=x, y=y, group = 1)) + 
    stat_summary_2d(aes(z=lab), 
        fun=function(z) names(which.max(table(z))), 
        binwidth=.1) + 
    scale_fill_manual(values = c("red" = "red", 
           "blue" = "blue", 
           "green" = "green", 
           "yellow" = "yellow"), 
        breaks = c("red", "blue", "green", "yellow")) + 
    ggtitle("Attempt 1") + theme(legend.position = "bottom") 

#Attempt 2 
p2 <- ggplot(df, aes(x=x, y=y)) + 
    stat_summary_2d(aes(z=val), 
        fun=function(z) names(which.max(table(z))), 
        binwidth=.1) + 
    scale_fill_manual(values = c("red", "blue", "green", "yellow")) + 
    ggtitle("Attempt 2") + theme(legend.position = "bottom") 

#Attempt 3 
p3 <- ggplot(df, aes(x=x, y=y)) + 
    stat_summary_2d(aes(z=as.numeric(lab)), 
        fun=function(z) names(which.max(table(z))), 
        binwidth=.1) + 
    scale_fill_manual(values = c("red", "blue", "green", "yellow")) + 
    ggtitle("Attempt 3") + theme(legend.position = "bottom") 

gridExtra::grid.arrange(p1, p2, p3, nrow = 1) 

combined plot

說明:如果您檢查第一個圖的基礎數據,你會發現,有379行數據,每個數據對應於熱圖中的一個圖塊。如果我們總計每個倉內不同顏色的數量,我們也會得到379個,所以實際上在每個倉位上都有多個貼圖。 (相反,第二個和第三個圖的基礎數據各有100行)

基於此,我們知道ggplot已經將「lab」中的每個因子級別解釋爲單獨的組,並分別執行stat_summary_2d()爲每個級別。將美學映射添加到group = 1迫使所有級別一起考慮。

p1.original <- ggplot(df, aes(x=x, y=y)) + 
    stat_summary_2d(aes(z=lab), 
        fun=function(z) names(which.max(table(z))), 
        binwidth=.1) 

View(layer_data(p1.original))