2013-02-10 46 views
2

我有一個簡單的數據幀。當在設定的時間內記錄零時刪除數據幀的部分

a <- c("06/12/2012 06:00","06/12/2012 06:05","06/12/2012 06:10","06/12/2012 06:15","06/12/2012 06:20","06/12/2012 06:25", 
    "06/12/2012 06:30","06/12/2012 06:35","06/12/2012 06:40","06/12/2012 06:45","06/12/2012 06:50","06/12/2012 06:55", 
    "06/12/2012 07:00","06/12/2012 07:05","06/12/2012 07:10","06/12/2012 07:15","06/12/2012 07:20","06/12/2012 07:25", 
    "06/12/2012 07:30","06/12/2012 07:35","06/12/2012 07:40","06/12/2012 07:45","06/12/2012 07:50","06/12/2012 07:55", 
    "06/12/2012 08:00") 
a <- strptime(a, "%d/%m/%Y %H:%M") 

b <-c("1","0","0","0","2","0","0","0","3","0","0","0","0","0","1","2","5","6","0","0","0","0","6","10","2") 
df1 <- data.frame(a,b) 

我想在沒有足夠的有效數據時使用R刪除部分數據幀。數據每5分鐘記錄一次。如果在'b'列只記錄零時連續20分鐘或更多的數據,則可以從我的最終數據框中刪除這些數據。

如果有人有任何想法來幫助我,我會非常感激。

+3

參見'?rle'? ... – 2013-02-10 18:43:55

回答

2

一個採用溶液rle(如奔下的評論中提到)

# get rle 
t <- rle(as.numeric(as.character(df1$b))) 
# check for condition. NOTE: here I assume all are 5 minute intervals!! 
# So, if rle length >= 4, then its >= 20 minute interval 
p <- which(t$values == 0 & t$lengths >= 4) 
w <- cumsum(t$lengths) 
o <- unlist(lapply(p, function(x) { 
    c((w[x-1]+1):w[x]) 
})) 
df1[-o, ] 

#      a b 
# 1 2012-12-06 06:00:00 1 
# 2 2012-12-06 06:05:00 0 
# 3 2012-12-06 06:10:00 0 
# 4 2012-12-06 06:15:00 0 
# 5 2012-12-06 06:20:00 2 
# 6 2012-12-06 06:25:00 0 
# 7 2012-12-06 06:30:00 0 
# 8 2012-12-06 06:35:00 0 
# 9 2012-12-06 06:40:00 3 
# 15 2012-12-06 07:10:00 1 
# 16 2012-12-06 07:15:00 2 
# 17 2012-12-06 07:20:00 5 
# 18 2012-12-06 07:25:00 6 
# 23 2012-12-06 07:50:00 6 
# 24 2012-12-06 07:55:00 10 
# 25 2012-12-06 08:00:00 2 
3

再一個,仍然使用rle

is.zero <- df1$b == 0 
is.zero.rle <- rle(is.zero) 
df1[rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero < 4, ] 

它可以幫助理解,如果我表現出的中間結果:

rep(is.zero.rle$lengths, is.zero.rle$lengths) * is.zero 
# [1] 0 3 3 3 0 3 3 3 0 5 5 5 5 5 0 0 0 0 4 4 4 4 0 0 0 
+0

(+1)非常真棒使用TRUE/FALSE和代表。 – Arun 2013-02-10 20:10:04