2017-07-06 64 views
0

我通過刪除50%的收入最低的行來切片data.frame,現在我想連接回舊的data.frame,以便我可以比較結果與切片針對切片前的結果。切片data.frame和無縫連接舊數據幀

我有一個解決方案,但尋找更優雅。

require(dplyr) 

> #creating my data.frame with revenue for id and subid  
> df <- data.frame(id = gl(n = 2, k= 5, length = 10), 
+     subid = gl(n = 6, k = 2, length = 10), 
+     rev = rnorm(10, 100, 15)) 
> df 
    id subid  rev 
1 1  1 102.80694 
2 1  1 77.88691 
3 1  2 122.71019 
4 1  2 67.13475 
5 1  3 93.21146 
6 2  3 91.48368 
7 2  4 103.05535 
8 2  4 82.27343 
9 2  5 106.03651 
10 2  5 81.14182 
> 
> #keep only subid with 50% highest turnover within each id 
> df_sliced <- df %>% 
+  arrange(id, desc(rev)) %>% 
+  group_by(id) %>% 
+  slice(seq(n()*0.5)) %>% 
+  group_by(id) %>% 
+  summarise(rev_sliced = sum(rev)) 
> 
> df_sliced 
Source: local data frame [2 x 2] 

     id rev_sliced 
    (fctr)  (dbl) 
1  1 225.5171 
2  2 209.0919 
> 
> #now I want to join back and compare my sliced result with result before slice. 
> df_desired <- df %>% 
+ group_by(id) %>% 
+ summarise(rev = sum(rev)) %>% 
+ cbind(df_sliced) #this will obviously also give me two columns with id. Desired result is with only one column for id. 
> 
> df_desired 
    id  rev id rev_sliced 
1 1 463.7503 1 225.5171 
2 2 463.9908 2 209.0919 

我還沒有解決如何使用連接,而不是如何在一個鏈中擁有所有東西。

回答

1

對於切片總和,您可以計算出如下的高於50%分位數的rev之和;那麼你就可以計算出兩者在同一總結表達,而不需要一個聯接:

df %>% 
    group_by(id) %>% 
    summarise(rev_sliced = sum(rev[rev > quantile(rev, 0.5)]), 
       rev = sum(rev)) 

# A tibble: 2 x 3 
#  id rev_sliced  rev 
# <int>  <dbl> <dbl> 
#1  1 225.5171 463.7502 
#2  2 209.0919 463.9908 
+0

那麼容易,做得很好。謝謝! –