2016-10-04 93 views
0

下面是一些遊戲數據在BY語句多個條件data.table總

df = data.frame(ID = c(1,1,1,2,2,2,2,3,3), 
       food = c("bacon","bacon","bacon","bacon","bacon","cheese","sausage","avocado","ham"), 
       enjoyment = c(20,20,20,20,20,20,20,20,20)) 

導致

ID food enjoyment 
1 1 bacon  20 
2 1 bacon  20 
3 1 bacon  20 
4 2 bacon  20 
5 2 bacon  20 
6 2 cheese  20 
7 2 sausage  20 
8 3 avocado  20 
9 3  ham  20 

我希望做的是,每個人(ID),總結他們的的享受燻肉和奶酪只

到目前爲止我的代碼是

library(data.table) 
setDT(df) 
df[,id_enjoyment_sum := sum(enjoyment), by =.(ID,food == "bacon"|food == "cheese")] 

導致

ID food enjoyment id_enjoyment_sum 
1: 1 bacon  20    60 
2: 1 bacon  20    60 
3: 1 bacon  20    60 
4: 2 bacon  20    60 
5: 2 bacon  20    60 
6: 2 cheese  20    60 
7: 2 sausage  20    20 
8: 3 avocado  20    40 
9: 3  ham  20    40 

這已經做了我想做的事情,但它也總結出每個人,他們的非培根和奶酪非食品的享受。請注意,ID 3不吃培根或奶酪,但我的代碼仍然總結了他享用他吃的東西。

理想情況下,代碼會導致

ID food enjoyment id_enjoyment_sum 
1: 1 bacon  20    60 
2: 1 bacon  20    60 
3: 1 bacon  20    60 
4: 2 bacon  20    60 
5: 2 bacon  20    60 
6: 2 cheese  20    60 
7: 2 sausage  20    60 
8: 3 avocado  20    0 
9: 3  ham  20    0 

所以我的問題是,我該如何建立BY子句來概括,每個ID只有培根和奶酪的享受?

+1

我想你想要'df [食品%in%c(「培根」,「奶酪」),s:=總和(享受),by = ID]''''我建議通過這些小插曲,澄清典型的語法模式:https://github.com/Rdatatable/data.table/wiki/Getting-started – Frank

+5

'df [,s:= sum(享受[食物%%c(「培根」,「奶酪」 )]),by = ID]'計算預期結果 – HubertL

+0

謝謝你們,HubertL的解決方案處理了我的真實數據,出於某種原因,Frank的結果與我原來的解決方案相同 –

回答

3

在一個班輪我這樣做:

df[, 
    id_enjoyment_sum := sum(
     ifelse(food %in% c("bacon", "cheese"), enjoyment, 0) 
    ) 
    , by =.(ID)] 

如果覆蓋的享受欄是沒有問題的,你可以考慮一下:

df[! food %in% c("bacon", "cheese"), enjoyment := 0] 
df[, id_enjoyment_sum := sum(enjoyment), by = .(ID)] 

當組由多個變量,將有是每個組合的組,並且聚合將在這些組內發生。所以你的情況有用於

  1. ID == 1 and (food == "bacon"|food == "cheese") == TRUE一組行,
  2. ID == 2 and (food == "bacon"|food == "cheese") == TRUE
  3. ID == 2 and (food == "bacon"|food == "cheese") == FALSE等。
+1

Thanks @Frank! –