2017-10-16 89 views
1

在這個問題上更小: Sum if the date difference is smaller than a value 現在我必須計算出以前12H感謝@Davis沃恩在發生事件數的可能性:總和如果時間差大於按id的值和類型

df <- tribble(
    ~fechayhora,  ~id,  ~tipo, 
    "2017-03-17 08:03:00", "A", "APF", 
    "2017-05-17 10:34:00", "A", "APF", 
    "2017-05-17 12:52:00", "A", "APF", 
    "2017-05-17 08:52:00", "A", "APP", 
    "2017-05-17 10:52:00", "A", "APP", 
    "2017-05-17 10:46:00", "B", "APP", 
    "2017-05-17 14:23:00", "B", "APP", 
    "2017-05-17 17:29:00", "B", "APF" 
) 

df <- df %>% 
    mutate(fechayhora = as.POSIXct(fechayhora), 
     minus_12 = fechayhora - hours(12)) 

df <- df %>% mutate(
    number_of_APF_12h = map2_dbl(.x = fechayhora, 
           .y = minus_12, 
           .f = ~sum(between(df$fechayhora, .y, .x)) - 
1)) 

然後,我試圖做同樣的事情,但通過「id」和「tipo」(類型)進行分組。我tryed與數據表和數據幀,沒有贏得成功:

df=df[,number_of_failures_12h = map2_dbl(.x = fechayhora, 
           .y = minus_12, 
           .f = ~sum(between(df$fechayhora, .y, .x)) - 
1)),by=.(tipo,id)] 

df <- df %>% 
group_by(id,tipo) 
%>% mutate(
    number_of_failure = map2_dbl(.x = fechayhora, 
           .y = minus_12, 
           .f = ~sum(between(df$fechayhora, .y, .x)) - 
1)) %>% 
ungroup() 

結果預計:

fechayhora    id tipo  n_APP n_APF 
    "2017-03-17 08:03:00", "A", "APF", 0  0 
    "2017-05-17 10:34:00", "A", "APF", 0  1 
    "2017-05-17 12:52:00", "A", "APF", 0  2 
    "2017-05-17 08:52:00", "A", "APP", 0  2 
    "2017-05-17 10:52:00", "A", "APP", 1  2 
    "2017-05-17 10:46:00", "B", "APP", 0  0 
    "2017-05-17 14:23:00", "B", "APP", 1  0 
    "2017-05-17 17:29:00", "B", "APF"  0  0 

謝謝!!

+0

對不起,有很多的猜測 – akrun

+0

如果你想告訴我什麼是不清楚謝謝你 – Martu

+0

你是如何得到預期的輸出與dplyr,因爲我無法 – akrun

回答

0

必須有一個分類器的方式,但是這應該這樣做:

# Auxiliary function 
count_failures <- function(group, last_12, rowid, type) { 
    group[1:rowid-1, ] %>% 
    filter(tipo %in% type & fechayhora >= last_12) %>% 
    nrow() 
} 

split_by_group <- df %>% 
    group_by(id) %>% 
    do(data = (.)) %>% 
    select(data) %>% 
    map(identity) %>% 
    .[[1]] 

df_s <- split_by_group %>% 
    map(arrange, fechayhora) %>% 
    map(.f = function(x) { 
    x %>% 
     rowid_to_column() %>% 
     rowwise() %>% 
     mutate(n_APP = count_failures(x, minus_12, rowid, "APP"), 
      n_APF = count_failures(x, minus_12, rowid, "APF")) %>% 
     ungroup() %>% 
     select(-rowid) 
     }) %>% 
    bind_rows() 

輸出:

# A tibble: 8 x 6 
      fechayhora id tipo   minus_12 n_APP n_APF 
       <dttm> <chr> <chr>    <dttm> <int> <int> 
1 2017-03-17 08:03:00  A APF 2017-03-16 20:03:00  0  0 
2 2017-05-17 08:52:00  A APP 2017-05-16 20:52:00  0  0 
3 2017-05-17 10:34:00  A APF 2017-05-16 22:34:00  1  0 
4 2017-05-17 10:52:00  A APP 2017-05-16 22:52:00  1  1 
5 2017-05-17 12:52:00  A APF 2017-05-17 00:52:00  2  1 
6 2017-05-17 10:46:00  B APP 2017-05-16 22:46:00  0  0 
7 2017-05-17 14:23:00  B APP 2017-05-17 02:23:00  1  0 
8 2017-05-17 17:29:00  B APF 2017-05-17 05:29:00  2  0 
+0

你好@ @ @!非常感謝你的回答,幾乎是這樣,但有兩列結果:n_failures類型APP的數量和n個失敗類型APF的數量;這是爲每一行我想知道有多少種類型的事件發生...... – Martu

+0

我已經編輯了答案,你能檢查一下,如果輸出結果是你想要的嗎?你的預期輸出和你想要的解釋似乎不匹配... – quartin

+0

@Martu我在'group_by'中有一個錯字。編輯答案時沒有正確複製。現在已經糾正了。 – quartin