0
我通過刪除50%的收入最低的行來切片data.frame,現在我想連接回舊的data.frame,以便我可以比較結果與切片針對切片前的結果。切片data.frame和無縫連接舊數據幀
我有一個解決方案,但尋找更優雅。
require(dplyr)
> #creating my data.frame with revenue for id and subid
> df <- data.frame(id = gl(n = 2, k= 5, length = 10),
+ subid = gl(n = 6, k = 2, length = 10),
+ rev = rnorm(10, 100, 15))
> df
id subid rev
1 1 1 102.80694
2 1 1 77.88691
3 1 2 122.71019
4 1 2 67.13475
5 1 3 93.21146
6 2 3 91.48368
7 2 4 103.05535
8 2 4 82.27343
9 2 5 106.03651
10 2 5 81.14182
>
> #keep only subid with 50% highest turnover within each id
> df_sliced <- df %>%
+ arrange(id, desc(rev)) %>%
+ group_by(id) %>%
+ slice(seq(n()*0.5)) %>%
+ group_by(id) %>%
+ summarise(rev_sliced = sum(rev))
>
> df_sliced
Source: local data frame [2 x 2]
id rev_sliced
(fctr) (dbl)
1 1 225.5171
2 2 209.0919
>
> #now I want to join back and compare my sliced result with result before slice.
> df_desired <- df %>%
+ group_by(id) %>%
+ summarise(rev = sum(rev)) %>%
+ cbind(df_sliced) #this will obviously also give me two columns with id. Desired result is with only one column for id.
>
> df_desired
id rev id rev_sliced
1 1 463.7503 1 225.5171
2 2 463.9908 2 209.0919
我還沒有解決如何使用連接,而不是如何在一個鏈中擁有所有東西。
那麼容易,做得很好。謝謝! –