在列的值之間刪除行

我有一個非常大的數據框，我想通過id刪除一列之間的行，但只有當它們在這個值內時，不在開始或結束時。在這個例子中我要刪除的行或=行或= '計劃'在列的值之間刪除行

id <- c(1,1,1,1,1,1,2,2,2,2,2,2) 
fd <- c(101,102,103,104,105,106,101,102,103,104,105,106) 
rem <- c(100,120,120,140, 140, 150, 200,220,220,250, 300, 310) 
or <- c("base", "base", "plan", "base", "plan", "base", "plan", "base", 
"plan", "base", "plan", "base") 
df <- data.frame(id, fd, rem, or)

結果之間 '基地'：

id1 <- c(rep(1,5), rep(2,4)) 
fd1 <- c(101,102,103,106, 107, 101,103,105,106) 
or1 <- c("base", "base", "plan", "plan", "base", "plan", "plan", "plan", "base") 

df1 <- data.frame(id1,fd1,or1)

來源

2017-04-19 AngeG

如果你有 '基地'/ '計劃' 的幾個實例，對於一些ID – akrun

我想刪除'計劃'之間的每一行以獲得相同的ID。例如對於id 1，我想離開前兩個'基'和最後一個（在id 2開始之前） – AngeG

兩種可能的解決方案：

1）使用鹼R：

idx <- ave(df$or, df$id, FUN = function(x) x=='base' & c('base',head(x,-1))=='plan' & c(tail(x,-1),'base')=='plan')=='FALSE' 
df[idx,]

其給出：

id fd rem or 
1 1 101 100 base 
2 1 102 120 base 
3 1 103 120 plan 
5 1 105 140 plan 
6 1 106 150 base 
7 2 101 200 plan 
9 2 103 220 plan 
11 2 105 300 plan 
12 2 106 310 base

2）使用data.table -package：

library(data.table) 
setDT(df) 

idx <- df[, .I[!(or=='base' & shift(or, fill = 'base')=='plan' & shift(or, fill = 'base', type = 'lead')=='plan')], id]$V1 
df[idx]

這給：

id fd rem or 
1: 1 101 100 base 
2: 1 102 120 base 
3: 1 103 120 plan 
4: 1 105 140 plan 
5: 1 106 150 base 
6: 2 101 200 plan 
7: 2 103 220 plan 
8: 2 105 300 plan 
9: 2 106 310 base

或者一氣呵成：

library(data.table) 
setDT(df)[df[, .I[!(or=='base' & shift(or, fill = 'base')=='plan' & shift(or, fill = 'base', type = 'lead')=='plan')], id]$V1]

響應於該評論，則可以使用rle -function到'plan' -rows之間檢測多於一個'base' -rows如下（以鹼R）：

# create new example dataset 
df2 <- df[c(1:3,4,4,5:7,8,8,9:12),] 

# the new example dataset: 
> df2 
    id fd rem or 
1 1 101 100 base 
2 1 102 120 base 
3 1 103 120 plan 
4 1 104 140 base 
4.1 1 104 140 base 
5 1 105 140 plan 
6 1 106 150 base 
7 2 101 200 plan 
8 2 102 220 base 
8.1 2 102 220 base 
9 2 103 220 plan 
10 2 104 250 base 
11 2 105 300 plan 
12 2 106 310 base 

# define function 
f <- function(x) { 
    rl <- rle(x) 
    rl$values <- !(rl$values == 'base' & c('base',head(rl$values,-1))=='plan' & c(tail(rl$values,-1),'base')=='plan') 
    inverse.rle(rl) 
} 

# apply the function to each id-group and create an index 
idx2 <- as.logical(ave(df2$or, df2$id, FUN = f)) 

# finally subset your data with the logical-index 
df2[idx2,]

其給出：

> df2[idx2,] 
    id fd rem or 
1 1 101 100 base 
2 1 102 120 base 
3 1 103 120 plan 
5 1 105 140 plan 
6 1 106 150 base 
7 2 101 200 plan 
9 2 103 220 plan 
11 2 105 300 plan 
12 2 106 310 base

在基礎R另一個選項（在評論由@弗蘭克的data.table建議啓發）：

f2 <- function(x) { 
    i <- seq_along(x) 
    w <- which(x == 'plan') 
    b <- which(x == 'base') 
    ib <- b[b > head(w,1) & b < tail(w,1)] 
    !(i %in% ib) 
} 

idx3 <- unlist(by(df2$or, df2$id, f2)) 
df2[idx3,]

隨着data.table你可以關注@弗蘭克的建議：

setDT(df2) 
df2[, keep := {isp = or == "plan"; wp = which(isp); r = 1:.N; isp | r < first(wp) | r > last(wp)}, by = id 
    ][!!keep]

使用的數據

df <- structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), 
        fd = c(101, 102, 103, 104, 105, 106, 101, 102, 103, 104, 105, 106), 
        rem = c(100, 120, 120, 140, 140, 150, 200, 220, 220, 250, 300, 310), 
        or = c("base", "base", "plan", "base", "plan", "base", "plan", "base", "plan", "base", "plan", "base")), 
       .Names = c("id", "fd", "rem", "or"), row.names = c(NA, -12L), class = "data.frame")

來源

2017-04-19 12:30:32 Jaap

任何想法如何修改代碼以在'plan'之後刪除行我有兩個或多個行惠特'基地'，然後再'計劃'。謝謝 – AngeG

@AngeG查看更新，HTH – Jaap

而不是找到哪些掉落，你可以標識那些保留（所有「計劃」，所有在第一次計劃之前或之後的計劃），如'df [，keep：= {isp =或==「計劃」; wp = which（isp）; r = 1：.N; isp | r last（wp）}，by = id]'或者類似的東西。 – Frank

在列的值之間刪除行

回答

相關問題