2017-04-05 60 views
0

我的data.frame包含個人工作人員的日誌以及他們在醫院某些病房度過的時間。該data.frame的結構如下:如何創建一個ID,沿着兩列,允許重複?

 Shift Worker   Ward Duration 
    <fctr> <fctr>   <fctr> <dbl> 
1  R1 Daniel   General 10 
2  R1 Daniel   General 15 
3  R2 Daniel   Anaesth 11 
4  R2 Daniel   Anaesth 13 
5  R2 Daniel   Anaesth 4 
6  R2 Daniel   General 15 
7  R2 Daniel   General 35 
8  R2 Daniel   Anaesth 6 
9  R2 Daniel   Anaesth 6 
10  R1 Caleb   Plastics 10 
11  R1 Caleb   Plastics 9 
12  R1 Caleb   Plastics 10 
13  R1 Caleb   Neuro  9 
14  R1 Caleb   Neuro  9 
15  R1 Caleb   Plastics 10 
16  R1 Caleb   Plastics 10 

現在,我想補充一點,包含每個病房的唯一ID列,沿着每一個工人,但我想要的ID是累積,並允許重複。我的預期輸出將是:

 Shift Worker   Ward Duration  ID 
    <fctr> <fctr>   <fctr> <dbl>  <fctr> 
1  R1 Daniel   General 10   1 
2  R1 Daniel   General 15   1 
3  R2 Daniel   Anaesth 11   2 
4  R2 Daniel   Anaesth 13   2 
5  R2 Daniel   Anaesth 4   2 
6  R2 Daniel   General 15   3 
7  R2 Daniel   General 35   3 
8  R2 Daniel   Anaesth 6   4 
9  R2 Daniel   Anaesth 6   4 
10  R1 Caleb   Plastics 10   1 
11  R1 Caleb   Plastics 9   1 
12  R1 Caleb   Plastics 10   1 
13  R1 Caleb   Neuro  9   2 
14  R1 Caleb   Neuro  9   2 
15  R1 Caleb   Plastics 10   3 
16  R1 Caleb   Plastics 10   3 

請注意ID如何積累。我該如何請這樣做?

我想要這個ID的原因是根據每個班次和工作人員調出病房的第一個和最後一個入口。然後我的預期輸出將是:

 Shift Worker   Ward Duration  ID 
    <fctr> <fctr>   <fctr> <dbl>  <fctr> 
1  R1 Daniel   General 10   1 
2  R1 Daniel   General 15   1 
3  R2 Daniel   Anaesth 11   2 
5  R2 Daniel   Anaesth 4   2 
6  R2 Daniel   General 15   3 
7  R2 Daniel   General 35   3 
8  R2 Daniel   Anaesth 6   4 
9  R2 Daniel   Anaesth 6   4 
10  R1 Caleb   Plastics 10   1 
12  R1 Caleb   Plastics 10   1 
13  R1 Caleb   Neuro  9   2 
14  R1 Caleb   Neuro  9   2 
15  R1 Caleb   Plastics 10   3 
16  R1 Caleb   Plastics 10   3 

有沒有辦法請這樣做? 任何幫助將不勝感激。

+0

'庫(dplyr); df%>%group_by(Worker)%>%mutate(ID = data.table :: rleid(Ward))'或完整data.table,'library(data.table); setDT(df)[,ID:= rleid(Ward),by = Worker] []' – alistaire

回答

2

我們可以通過「工人」分組通過比較相鄰的元素後,做這個「病房」,即除去第一和最後一個,然後用cumsum得到所需要的輸出

library(dplyr) 
df1 %>% 
    group_by(Worker) %>% 
    mutate(ID = cumsum(c(TRUE, Ward[-1] != Ward[-n()]))) 
# Shift Worker  Ward Duration ID 
# <chr> <chr> <chr> <int> <int> 
#1  R1 Daniel General  10  1 
#2  R1 Daniel General  15  1 
#3  R2 Daniel Anaesth  11  2 
#4  R2 Daniel Anaesth  13  2 
#5  R2 Daniel Anaesth  4  2 
#6  R2 Daniel General  15  3 
#7  R2 Daniel General  35  3 
#8  R2 Daniel Anaesth  6  4 
#9  R2 Daniel Anaesth  6  4 
#10 R1 Caleb Plastics  10  1 
#11 R1 Caleb Plastics  9  1 
#12 R1 Caleb Plastics  10  1 
#13 R1 Caleb Neuro  9  2 
#14 R1 Caleb Neuro  9  2 
#15 R1 Caleb Plastics  10  3 
#16 R1 Caleb Plastics  10  3 

還是一個base R辦法是通過做一組ave並獲得該指數rle

df1$ID <- with(df1, as.integer(ave(Ward, Worker, FUN = function(x) 
         with(rle(x), rep(seq_along(values), lengths))))) 
相關問題