2017-03-15 83 views
1

計數我有2個dataframes這樣合併2個dataframes與日期時間條件,並獲得passfails

DF1

ID <- c("ID001","ID001","ID002","ID003") 
Type <- c("A","A","B","A") 
Measurement <- c("Length","Breadth","Length","Length") 
When <- c("2016-09-09 06:00:13", "2016-09-19 09:13:10", "2016-10-13 11:45:14", "2016-10-29 11:56:00") 

df1 <- data.frame(ID,Type,Measurement,When) 

DF2

ID <- c("ID001","ID001","ID001","ID001","ID001", 
      "ID002","ID002","ID002","ID002","ID002") 
    Type <- c("A","A","A","A","A", 
       "B","B","B","B","B") 
    Measurement <- c("Length","Length","Length","Length","Length", 
        "Length","Length","Length","Length","Length") 
    Datetime <- c("2016-09-09 01:00:13", "2016-09-09 04:00:13", "2016-09-09 09:00:13", "2016-09-09 21:00:13","2016-09-09 23:00:13", 
        "2016-10-13 10:45:14", "2016-10-13 11:15:14", "2016-10-13 11:48:14", "2016-10-13 11:55:14","2016-10-13 21:45:14") 
    PassFail <- c("Pass","Fail","Pass","Fail","Pass", 
        "Fail","Fail","Pass","Pass","Pass") 

    df2 <- data.frame(ID,Type,Measurement,Datetime,PassFail) 

我想合併這兩個數據幀以獲取通過計數並僅在df2中的「Datetime」進行測量時失敗大於df1中的「WHEN」。

我期望的輸出是

ID Type Measurement    When PassCount FailCount 
    ID001 A  Length 2016-09-09 06:00:13   2   1 
    ID002 B  Length 2016-10-13 11:45:14   3   0 

我試着用sqldf得到這個

library(sqldf) 
df3<-sqldf("SELECT L.*, r.Datetime, r.PASSFAIL 
      FROM df1 as L 
      LEFT JOIN df2 as r 
      ON L.ID=r.ID 
      AND L.Type=r.Type 
      AND L.Measurement=r.Measurement 
      WHERE r.Datetime > L.When 
      ORDER BY L.When") 

我處於獲得輸出不成功。有人能指出我正確的方向嗎?我也想要一個快速合併解決方案,因爲我想將它應用於更大的數據集。

+0

請使用日期時間格式,而不是因素。 – Frank

+0

dplyr有像left_join,filter,group_by這樣的函數,總結一下,應該解決它 –

回答

4

隨着data.table,非等距加入似乎工作:

library(data.table) 
setDT(df1)[, When := as.POSIXct(When)] 
setDT(df2)[, Datetime := as.POSIXct(Datetime)] 

df2[df1, on=.(ID, Datetime > When), if (.N > 0L) as.list(table(PassFail)), by=.EACHI] 

#  ID   Datetime Fail Pass 
# 1: ID001 2016-09-09 06:00:13 1 2 
# 2: ID002 2016-10-13 11:45:14 0 3 

如果你想爲df1每排一排,除去if條款。

要作爲列添加計數df1

df1[, levels(df2$PassFail) := 
    df2[df1, on=.(ID, Datetime > When), as.list(table(PassFail)), by=.EACHI][, !c("ID","Datetime")] 
] 
+2

精彩的解決方案。我花了一段時間來理解你的代碼,但現在是合理的。非常感謝。我只是將它應用到一個更大的數據集,它的作用就像魅力。 – Sharath