2016-11-17 65 views
0

之間的差異創建組,我想如果可能使用dplyr和stringr,或Tidyverse內至少留下來實現以下目標:與Dplyr的「GROUP_BY」,然後使用Stringr或一組操作查找組

我需要通過CaseWorker和Client對數據進行分組,並比較「任務」和「任務2」以查找「任務2」中不在「任務」中的所有類別以及「任務2」類別的相關總時間。

「任務」可以包含不屬於「任務2」的類別,所以我只對在「任務2」中找到不屬於「任務」的類別感興趣。能夠創建新列以顯示「任務2」中而不是「任務」中的特定條目以及關聯的「時間」值是非常好的。

最終結果應該顯示客戶Chris的四個新列,一個用於「鐵襯衫」,一個列用於關聯的「時間」45,以及「做家庭作業」的列和「時間」列21.將有客戶埃裏克,一個「鐵襯衣」,一個爲12

CaseWorker<-c("John","John","John","John","John","John","John","John", 
"John","Kim","Kim") 

    Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric") 

Task<-c("Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Buy groceries","Buy groceries","Buy groceries","Do homework","Do homework") 

Task2<-c("Feed cat","Iron shirt","Iron shirt","Do homework","Do homework","Do homework","Make dinner","Feed cat","Feed cat","Do homework","Iron shirt") 

Time<-c(20,34,11,10,5,6,55,30,20,10,12) 

Df<-data.frame(CaseWorker,Client,Task,Task2,Time) 
+0

您如何預期尚不清楚輸出欄應該像。我在下面發佈的解決方案中使用了廣泛的格式 – akrun

回答

0

相關的時間兩個新列,我們可以嘗試

library(dplyr) 
library(tidyr) 
Df %>% 
    group_by(CaseWorker, Client) %>% 
    filter(Task2 %in% setdiff(Task2, Task)) %>% 
    group_by(Task2, add=TRUE) %>% 
    summarise(Time = sum(Time)) %>% 
    spread(Task2, Time) 
# CaseWorker Client `Do homework` `Iron shirt` 
#*  <fctr> <fctr>   <dbl>  <dbl> 
#1  John Chris   21   45 
#2  Kim Eric   NA   12