Q

根據觀察數過濾ggplot2密度圖

r
ggplot2

2011-05-20 59 views 3 likes

3

是否可以過濾ggplot2調用中具有少量觀測值的數據的子集？根據觀察數過濾ggplot2密度圖

例如，採取以下情節：qplot(price,data=diamonds,geom="density",colour=cut)

Density plot

情節有點忙，我想用少量的觀測排除cut值，即

> xtabs(~cut,diamonds) 
cut 
    Fair  Good Very Good Premium  Ideal 
    1610  4906  12082  13791  21551

Fair和Good質量的cut因素。

我想要一個可以適合任意數據集的解決方案，並且如果可能的話，不僅可以選擇閾值數量的觀測值，還可以選擇前3個。

2011-05-20 James

A

回答

9

ggplot(subset(diamonds, cut %in% arrange(count(diamonds, .(cut)), desc(freq))[1:3,]$cut), 
    aes(price, colour=cut)) + 
    geom_density() + facet_grid(~cut)

count向上計數每個元素注入data.frame。
arrange根據指定的列對data.frame進行排序。
desc啓用顛倒順序排序。
最終將其切割包括在前3中的行子集%in%。

2011-05-20 14:43:42 kohske

+0

+1用於使用內置函數 – 2011-05-20 14:45:21

+0

缺少小滴（），但喜歡使用％in％+1 – 2011-05-20 14:46:03

+0

是的。因此，如果您想繪製未使用因子的圖例，則小滴（子集（...））是正確的。謝謝。 – kohske 2011-05-20 14:48:10

1

## Top 3 cuts 
tmp <- names(sort(summary(diamonds$cut), decreasing = T))[1:3] 
tmp <- droplevels(subset(diamonds, cut == tmp)) 
ggplot(tmp, aes(price, color=cut)) + geom_density()

enter image description here

但你有沒有考慮小面？

ggplot(diamonds, aes(price, color=cut)) + geom_density() + facet_grid(~cut)

enter image description here

2011-05-20 14:25:59

+0

感謝布蘭登，但是我使用的數據中有很多因子，所以我真的想要一種只選擇最多的因子，否則空間和清晰度成爲問題。 – James 2011-05-20 14:29:27

+0

在你的問題中，你寫了top3，但是你指定了Fair2和Good2。如果是後者，請在我的解決方案中刪除遞減= T並將[1：3]更改爲[1：2] – 2011-05-20 14:50:16

+0

由於我將xtabs輸出放在句子中間，所以我不清楚它的含義，但我想排除Fair and Good 。您的新解決方案按預期工作，謝謝！ – James 2011-05-20 14:58:01

2

這似乎需要編寫自己的子集的功能，也許是這樣的：

mySubset <- function(dat,largestK=3,thresh=NULL){ 
    if (is.null(thresh)){ 
     tbl <- sort(table(dat)) 
     return(dat %in% tail(names(tbl),largestK)) 
    } 
    else{ 
     return(dat >= thresh) 
    } 
}

這可能在ggplot調用中使用這樣的：

ggplot(diamonds[mySubset(diamonds$cut),],...)

此代碼不處理因素下降的水平，所以要小心爲了那個原因。除非我絕對需要訂購它們，否則我通常會將分類變量留作字符。

2011-05-20 14:33:24 joran

+0

謝謝，這按預期工作。您可以通過在colour調用中重構剪切來降低關卡。 – James 2011-05-20 14:56:01

3

這是我的要求。首先做一個函數，返回更多obs的類別。

firstx <- function (category, data, x = 1:3) { 
    tab <- xtabs(~category, data) 

    dimnames(tab)$category[order(tab, decreasing = TRUE)[x]] 
} 

#Then use subset to subset the data and droplevels to drop unused levels 
#so they don't clutter the legend. 
ggplot(droplevels(subset(diamonds, cut %in% firstx(cut, diamonds))), 
     aes(price, color = cut)) + geom_density()

我希望有幫助。

2011-05-20 14:38:20

+0

謝謝，這按預期工作。 – James 2011-05-20 14:58:23

相關問題