通過分配列

我有這樣的例子data.frame Aggreageting在data.frame行：通過分配列

set.seed(1) 
df <- data.frame(id = letters[1:10], a = sample(100,10), b = sample(100,10), 
       aggregate_with = c(rep(NA,6),"y","b","b","e"), aggregate_order = c(rep(NA,6),"a,b","a,b","b,a","a,b")) 

> df 
    id a b aggregate_with aggregate_order 
1 a 27 21   <NA>   <NA> 
2 b 37 18   <NA>   <NA> 
3 c 57 68   <NA>   <NA> 
4 d 89 38   <NA>   <NA> 
5 e 20 74   <NA>   <NA> 
6 f 86 48   <NA>   <NA> 
7 g 97 98    y    a,b 
8 h 62 93    b    a,b 
9 i 58 35    b    b,a 
10 j 6 71    e    a,b

我想，他們的aggregate_with值匹配其他行的id值（有效集合行的行自身aggregate_with值無法匹配它自己的id值），我想要應用的函數是根據aggregate_order列中的分配將它們的a和b的值相加。聚合行的id,aggregate_with和aggregate_order應保留由aggregate_with列指示的行的值。

下面是導致data.frame應該是什麼樣子：

> aggregated.df 
    id a b aggregate_with aggregate_order 
1 a 27 21   <NA>   <NA> 
2 b 134 169   <NA>   <NA> 
3 c 57 68   <NA>   <NA> 
4 d 89 38   <NA>   <NA> 
5 e 26 145   <NA>   <NA> 
6 f 86 48   <NA>   <NA> 
7 g 97 98    y    a,b

正如你所看到的，列在aggregated.df第2行的a是a列a，的總和，和行2，8 b，一9分別在df，反之亦然b列。列a和b第aggregated.df行第5行的a和b行df第5行和第10行。儘管df中的第7行的值爲aggregate_with，但它不存在於df中，因此未彙總。

來源

2016-02-29 user1701545

循環 - 但認爲有一個更優雅的解決方案。 – user1701545

你應該用自己所擁有的東西進行編輯，這樣人們就不會花費很多精力去到你已經存在的地方。 – alistaire

我正在使用data.table庫。

library(data.table) 
dt <- as.data.table(df) 

#a table to join with 
dt2 <- dt[, list(id = aggregate_with, a, b, aggregate_order)] 
#set the right order 
dt2[, c('a', 'b') := list(ifelse(aggregate_order == 'a,b', a, b), ifelse(aggregate_order == 'a,b', b, a))] 
setkey(dt2, id) 

#joining tables 
res <- dt2[dt] 

#replacing NA's with 0 and summing 
for (j in c('a', 'b')) set(res, which(is.na(res[[j]])), j, 0) 
res[!aggregate_with %in% id, list(a = sum(a) + i.a[1], b = sum(b) + i.b[1]), by = id]

來源

2016-02-29 08:31:19

回答

相關問題