總結數據並保留日期列值

我之前問過類似的問題並得到了很好的答案，但需要關於總結和日期主題的更多指導。 Summarize and count data in R with dplyr 總結數據並保留日期列值

目標：

在我的新的數據集我有日期欄，當事件發生。當我想在本例中繼續在其他崗位的建議，我得到一個錯誤信息：

數據集：

structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), 
Date = c("25.11.2015 13:59", "03.12.2015 09:32", "07.12.2015 08:18", "08.12.2015 19:40", "08.12.2015 19:40", 
"22.12.2015 08:50", "22.12.2015 08:52", "05.01.2016 13:22", 
"06.01.2016 09:18", "14.02.2016 22:47", 
"20.02.2016 21:27", "01.04.2016 13:52", "24.07.2016 07:03"), 
    StimuliA = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L), StimuliB = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L), R2 = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 1L, 0L), R3 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L), R5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L), R6 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 
    0L, 0L, 0L, 0L), R7 = c(0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
    0L, 0L, 0L, 0L), stims = c("0_0", "0_0", "1_0", "1_0", "1_1", 
    "1_1", "1_1", "1_1", "1_1", "1_2", "1_2", "1_2", "2_2")), .Names = c("User", "Date", "StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6", "R7", "stims"), row.names = c(NA, -13L), spec = structure(list(
    cols = structure(list(User = structure(list(), class = c("collector_integer", 
    "collector")), Date = structure(list(), class = c("collector_character", 
    "collector")), StimuliA = structure(list(), class = c("collector_integer", 
    "collector")), StimuliB = structure(list(), class = c("collector_integer", 
    "collector")), R2 = structure(list(), class = c("collector_integer", 
    "collector")), R3 = structure(list(), class = c("collector_integer", 
    "collector")), R4 = structure(list(), class = c("collector_integer", 
    "collector")), R5 = structure(list(), class = c("collector_integer", 
    "collector")), R6 = structure(list(), class = c("collector_integer", 
    "collector")), R7 = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("User", "Date", "StimuliA", "StimuliB", 
    "R2", "R3", "R4", "R5", "R6", "R7")), default = structure(list(), class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"), class = c("tbl_df", "tbl", "data.frame"))

代碼：

df$stims <- with(df, paste(cumsum(StimuliA), cumsum(StimuliB), sep="_"))  
aggregate(. ~ User + stims, data=df, sum) 
Error in Summary.factor(c(12L, 2L), na.rm = FALSE) : 
‘sum’ not meaningful for factors

問題/預期結果： 在我的結果中，我想保留刺激發生的日期（或刺激A和B是0，則特定用戶的第一個日期）

User Date   StimuliA StimuliB R2 R3 R4 R5 R6 R7 
1 25.11.2015 13:59  0   0  1 0 0 0 0 1 
1 07.12.2015 08:18  1   0  0 0 0 0 1 0 
1 08.12.2015 19:40  0   1  0 2 0 0 1 1 
2 05.01.2016 13:22  0   0  0 0 0 0 1 0 
2 14.02.2016 22:47  0   1  2 0 0 0 0 0 
2 24.07.2016 07:03  1   0  0 0 0 0 0 0

在這個結果表的，我們有值（R2-R7）中，當刺激A和B是仍爲0 [線路1]接着的總和對於每種刺激，都有R2-R7的總和，直到下一次刺激發生。

此建議在以前的帖子，但我無法使其工作：

你不想日期爲因素的工作。使用as.Date將日期變換爲 Date變量（SO上的許多帖子）。一種方法然後將單獨聚合日期變量的用戶和 stims類似於上面，取最小值而不是總和。然後合併兩個生成的data.frames。如果這沒有意義，那麼可能值得問一個新的問題，鏈接到這個問題，添加日期變量的其他問題。還包括一個例子數據集，其包括經由dplyr此變量@lmo

來源

2017-07-26 svnnf

是否缺少從'用戶連續== 2'？一個沒有刺激... – Sotos

@Sotos是的！你是對的，我忘了那個。 – svnnf

一個

想法是將過濾所有非刺激和抓住每個用戶的第一觀察（通過slice）。該過濾器的所有的刺激和bind_rows，即

library(dplyr) 

bind_rows(
    df %>% 
    filter(rowSums(.[3:4]) == 0) %>% 
    group_by(User) %>% 
    slice(1L), 
    df %>% 
    filter(rowSums(.[3:4]) != 0)) %>% 
    arrange(User)

其中給出，

# A tibble: 6 x 11 
# Groups: User [2] 
    User    Date StimuliA StimuliB R2 R3 R4 R5 R6 R7 stims 
    <int>   <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr> 
1  1 25.11.2015 13:59  0  0  1  0  0  0  0  0 0_0 
2  1 07.12.2015 08:18  1  0  0  0  0  0  0  0 1_0 
3  1 08.12.2015 19:40  0  1  0  0  0  0  0  0 1_1 
4  2 05.01.2016 13:22  0  0  0  0  0  0  1  0 1_1 
5  2 14.02.2016 22:47  0  1  0  0  0  0  0  0 1_2 
6  2 24.07.2016 07:03  1  0  0  0  0  0  0  0 2_2

來源

2017-07-26 10:05:53 Sotos

謝謝@Sotos！在你發佈的tibble中，R2-R7中的值沒有被彙總，據我所知。您的代碼是否發佈了您在會話中運行的所有內容，或者是否插入了其他內容？你在哪裏創建了這個stims？（像在之前的文章中那樣） – svnnf

@svnnf此時，您可以將第一列，第二列和最後一列合併（加入）上一篇文章的data.frame，並使用第一列和最後一列作爲ID變量。 – lmo

@svnnf包含在你的例子中的'stims'。至於'R2-R7' ...你如何得到他們的價值？（我剛剛看到它們不同） – Sotos

這裏，Date製成POSIXct類，保存日期和時間，這是這項任務至關重要。 as.Date()將從日期中刪除時間。

library(dplyr) 

union_all(
    df %>% 
     mutate(Date = as.POSIXct(strptime(Date, "%d.%m.%Y %H:%M"))) %>% 
     filter(StimuliA == 0, StimuliB == 0, Date == min(Date)), 
    df %>% 
     mutate(Date = as.POSIXct(strptime(Date, "%d.%m.%Y %H:%M"))) %>% 
     filter(StimuliA == 1 | StimuliB == 1)) %>% 
    arrange(User, Date) %>% 
    select(-stims)

輸出：

 User    Date StimuliA StimuliB R2 R3 R4 R5 R6 R7 
    <int>    <dttm> <int> <int> <int> <int> <int> <int> <int> <int> 
    1  1 2015-11-25 13:59:00  0  0  1  0  0  0  0  0 
    2  1 2015-12-07 08:18:00  1  0  0  0  0  0  0  0 
    3  1 2015-12-08 19:40:00  0  1  0  0  0  0  0  0 
    4  2 2016-02-14 22:47:00  0  1  0  0  0  0  0  0 
    5  2 2016-07-24 07:03:00  1  0  0  0  0  0  0  0

來源

2017-07-26 10:20:06 Odysseus210

很好地完成了這個！ – akrun

總結數據並保留日期列值

回答

相關問題