2017-04-12 54 views
0

我有2個dataframes這樣過濾行

DF1

Measurement <- c("Length","Breadth","Height","Width") 
When <- c("2017-04-07 15:19:02", "2017-02-10 09:13:10", "2017-01-13 11:45:14", "2016-11-13 21:35:24") 
Fail <- c(2,3,2,3) 
Pass <- c(2,2,4,2) 
df1 <- data.frame(Measurement,When,Fail,Pass) 
df1$When <- as.POSIXct(df1$When) 

DF2

Measurement <- c("Length","Length","Length","Length", 
       "Breadth","Breadth","Breadth","Breadth","Breadth", 
       "Height","Height","Height","Height","Height","Height", 
       "Width","Width","Width","Width","Width") 
Datetime <- c("2017-04-08 15:19:02","2017-04-09 15:19:02","2017-04-09 16:19:02","2017-04-10 15:19:02", 
       "2017-02-11 09:13:10","2017-02-12 09:13:10","2017-02-13 09:13:10","2017-02-14 09:13:10","2017-02-15 09:13:10", 
       "2017-01-19 11:45:14","2017-01-20 11:45:14","2017-01-21 11:45:14","2017-01-23 11:45:14","2017-01-27 11:45:14","2017-01-13 11:45:14", 
       "2016-11-12 21:35:24","2016-11-14 21:35:24","2016-11-17 21:35:24","2016-11-19 21:35:24","2016-11-19 23:35:24") 
PassFail <- c("Fail","Fail","Pass","Pass", 
       "Fail","Pass","Fail","Fail","Pass", 
       "Fail","Fail","Pass","Pass","Pass","Pass", 
       "Fail","Fail","Pass","Fail","Pass") 
df2 <- data.frame(Measurement,Datetime,PassFail) 
df2$Datetime <- as.POSIXct(df2$Datetime) 

DF1具有通過和失敗從每個測量的df2報告計數。我正在嘗試使用以下條件來過濾df1數據幀。

  1. 對於DF1每一行,我想看看DF2以檢查第2個測量(按日期時間排序)是連續的失敗。我想在df1中保留該測量行。
  2. 我也想檢查上述條件只有當「日期時間」在df2>「當」在df1。

所需的輸出

Measurement    When Fail Pass 
     Length 2017-04-07 15:19:02 2 2 
     Height 2017-01-13 11:45:14 2 4 

我用這種方式DF1計數,但不能對其進行過濾,根據上述邏輯,以保留感興趣的行。

setDT(df1)[, When := as.POSIXct(When)] 
setDT(df2)[, Datetime := as.POSIXct(Datetime)] 
df1[df2, on=.(Measurement, Datetime > When), 
       if (.N > 0L) as.list(table(PassFail)), by=.EACHI] 

難道有人指着我正確的方向嗎?我也想要一個快速的過濾器解決方案,因爲我想將它應用於更大的數據集。

+0

你的最後一行是差不多吧。可以做'df1 [df2 [df1,on =。(Measurement,Datetime> When),all(head(x.PassFail,2)==「Fail」),by = .EACHI] $ V1]'關於if .N> 0L)檢查,我認爲你可以在連接中設置nomatch = 0。 – Frank

+0

太棒了。愛這個解決方案。非常感謝弗蘭克:-)你可以把它作爲答案嗎? – Sharath

回答

1

只是一個小擴展到OP代碼:

df1[ 
    df2[df1, on=.(Measurement, Datetime > When), 
    all(head(x.PassFail, 2) == "Fail") 
    , by=.EACHI]$V1 
] 
+0

它*應該*可以寫'.SD'來代替內部的'df1',但我猜可能有一個錯誤。 – Frank