兩種可能的解決方案:
1)使用鹼R:
idx <- ave(df$or, df$id, FUN = function(x) x=='base' & c('base',head(x,-1))=='plan' & c(tail(x,-1),'base')=='plan')=='FALSE'
df[idx,]
其給出:
id fd rem or
1 1 101 100 base
2 1 102 120 base
3 1 103 120 plan
5 1 105 140 plan
6 1 106 150 base
7 2 101 200 plan
9 2 103 220 plan
11 2 105 300 plan
12 2 106 310 base
2)使用data.table
-package:
library(data.table)
setDT(df)
idx <- df[, .I[!(or=='base' & shift(or, fill = 'base')=='plan' & shift(or, fill = 'base', type = 'lead')=='plan')], id]$V1
df[idx]
這給:
id fd rem or
1: 1 101 100 base
2: 1 102 120 base
3: 1 103 120 plan
4: 1 105 140 plan
5: 1 106 150 base
6: 2 101 200 plan
7: 2 103 220 plan
8: 2 105 300 plan
9: 2 106 310 base
或者一氣呵成:
library(data.table)
setDT(df)[df[, .I[!(or=='base' & shift(or, fill = 'base')=='plan' & shift(or, fill = 'base', type = 'lead')=='plan')], id]$V1]
響應於該評論,則可以使用rle
-function到'plan'
-rows之間檢測多於一個'base'
-rows如下(以鹼R):
# create new example dataset
df2 <- df[c(1:3,4,4,5:7,8,8,9:12),]
# the new example dataset:
> df2
id fd rem or
1 1 101 100 base
2 1 102 120 base
3 1 103 120 plan
4 1 104 140 base
4.1 1 104 140 base
5 1 105 140 plan
6 1 106 150 base
7 2 101 200 plan
8 2 102 220 base
8.1 2 102 220 base
9 2 103 220 plan
10 2 104 250 base
11 2 105 300 plan
12 2 106 310 base
# define function
f <- function(x) {
rl <- rle(x)
rl$values <- !(rl$values == 'base' & c('base',head(rl$values,-1))=='plan' & c(tail(rl$values,-1),'base')=='plan')
inverse.rle(rl)
}
# apply the function to each id-group and create an index
idx2 <- as.logical(ave(df2$or, df2$id, FUN = f))
# finally subset your data with the logical-index
df2[idx2,]
其給出:
> df2[idx2,]
id fd rem or
1 1 101 100 base
2 1 102 120 base
3 1 103 120 plan
5 1 105 140 plan
6 1 106 150 base
7 2 101 200 plan
9 2 103 220 plan
11 2 105 300 plan
12 2 106 310 base
在基礎R另一個選項(在評論由@弗蘭克的data.table建議啓發):
f2 <- function(x) {
i <- seq_along(x)
w <- which(x == 'plan')
b <- which(x == 'base')
ib <- b[b > head(w,1) & b < tail(w,1)]
!(i %in% ib)
}
idx3 <- unlist(by(df2$or, df2$id, f2))
df2[idx3,]
隨着data.table
你可以關注@弗蘭克的建議:
setDT(df2)
df2[, keep := {isp = or == "plan"; wp = which(isp); r = 1:.N; isp | r < first(wp) | r > last(wp)}, by = id
][!!keep]
使用的數據
df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
fd = c(101, 102, 103, 104, 105, 106, 101, 102, 103, 104, 105, 106),
rem = c(100, 120, 120, 140, 140, 150, 200, 220, 220, 250, 300, 310),
or = c("base", "base", "plan", "base", "plan", "base", "plan", "base", "plan", "base", "plan", "base")),
.Names = c("id", "fd", "rem", "or"), row.names = c(NA, -12L), class = "data.frame")
如果你有 '基地'/ '計劃' 的幾個實例,對於一些ID – akrun
我想刪除'計劃'之間的每一行以獲得相同的ID。例如對於id 1,我想離開前兩個'基'和最後一個(在id 2開始之前) – AngeG