2016-07-22 54 views
1

當我在mtcars數據集上執行以下查詢時,我得到下面的結果。Dplyr:在組內總結,變異和排名

mtcars %>% 
    group_by(cyl,gear) %>% 
    summarise(total_cnt = n(), totalwt = sum(wt)) %>% 
    arrange(cyl, gear, desc(total_cnt), desc(totalwt)) %>% 
    mutate(rank = dense_rank(desc(total_cnt))) %>% 
    arrange(rank) 

cyl gear total totalwt rank 
    <dbl> <dbl> <int> <dbl> <int> 
1  4  4  8 19.025  1 
2  6  4  4 12.375  1 
3  8  3 12 49.249  1 
4  4  5  2 3.653  2 
5  6  3  2 6.675  2 
6  8  5  2 6.740  2 
7  4  3  1 2.465  3 
8  6  5  1 2.770  3 

每個組中現在(隊伍),我想分排名基於totalwt的意見,所以最終的輸出應該像(每個級別組內的totalwt遞減順序排列)

cyl gear total_cnt totalwt rank subrank 
    <dbl> <dbl>  <int> <dbl> <int> <int> 
1  4  4   8 19.025  1 2 
2  6  4   4 12.375  1 3 
3  8  3  12 49.249  1 1 
4  4  5   2 3.653  2 3 
5  6  3   2 6.675  2 2 
6  8  5   2 6.740  2 1 
7  4  3   1 2.465  3 2 
8  6  5   1 2.770  3 1 

後來終於TOP 1,其中每個等級,其中等級分= 1,所以輸出會是:

cyl gear total_cnt totalwt rank subrank 
    <dbl> <dbl>  <int> <dbl> <int> <int> 
3  8  3  12 49.249  1 1 
6  8  5   2 6.740  2 1 
8  6  5   1 2.770  3 1 

回答

3

如果「mtcars1」是從OP的代碼輸出,我們可以使用rank來由 '等級'

mtcars2 <- mtcars1 %>% 
       group_by(rank) %>% 
       mutate(subrank = rank(-totalwt)) 
mtcars2 
# cyl gear total_cnt totalwt rank subrank 
# <dbl> <dbl>  <int> <dbl> <int> <dbl> 
#1  4  4   8 19.025  1  2 
#2  6  4   4 12.375  1  3 
#3  8  3  12 49.249  1  1 
#4  4  5   2 3.653  2  3 
#5  6  3   2 6.675  2  2 
#6  8  5   2 6.740  2  1 
#7  4  3   1 2.465  3  2 
#8  6  5   1 2.770  3  1 

分組然後之後創建 'subrank',我們filter行其中 'subrank' 是1個

mtcars2 %>% 
     filter(subrank ==1) 
# cyl gear total_cnt totalwt rank subrank 
# <dbl> <dbl>  <int> <dbl> <int> <dbl> 
#1  8  3  12 49.249  1  1 
#2  8  5   2 6.740  2  1 
#3  6  5   1 2.770  3  1 
+1

非常感謝,它的工作原理 –