2017-05-03 99 views
0

我有一個數據幀,說刪除的情況下某些其他情況下

df = data.frame(x = c("a","a","b","b","b","c","d","t","c","b","t","c","t","a","a","b","d","t","t","c"), 
       y = c(2,4,5,2,6,2,4,5,2,6,2,4,5,2,6,2,4,5,2,6)) 

我只想刪除其中一個或多個t s爲直接在dc之間的那些行,在其他情況下我想保留這些案件。所以對於這個例子,我想刪除第8,18和19行的t,但保留其他的。我有數以千計的案件,所以手動這樣做會是一個真正的恐怖。很感謝任何形式的幫助。

+1

你的意思是行** 8 **,18,19 ...? – Sotos

+0

你可以在paste0(df $ x,collapse =「」)'上使用'regexec(「dt + c」,...)'來確定模式的位置。一旦發現你必須操縱字符串來查找下一次發生的模式。 – jogo

+0

@Sotos的確是我的意思,對不起 –

回答

1

一種選擇是使用rle得到相同的字符串的運行,然後你可以使用sapply檢查前進/後退和返回所有要刪除的位置:

rle_vals <- rle(as.character(df$x)) 

drop <- unlist(sapply(2:length(rle_vals$values), #loop over values 
         function(i, vals, lengths) { 
         if(vals[i] == "t" & vals[i-1] == "d" & vals[i+1] == "c"){#Check if value is "t", previous is "d" and next is "c" 
          (sum(lengths[1:i-1]) + 1):sum(lengths[1:i]) #Get row #s 
         } 
         },vals = rle_vals$values, lengths = rle_vals$lengths)) 

drop 
#[1] 8 18 19 

df[-drop,] 
# x y 
#1 a 2 
#2 a 4 
#3 b 5 
#4 b 2 
#5 b 6 
#6 c 2 
#7 d 4 
#9 c 2 
#10 b 6 
#11 t 2 
#12 c 4 
#13 t 5 
#14 a 2 
#15 a 6 
#16 b 2 
#17 d 4 
#20 c 6 
+0

這似乎像一個魅力,它使你成爲一個真正的節省時間的英雄工作!非常感謝!當然,我已經提高了你的評論,但它不是可視的,因爲我不在這裏很長時間。 –

1

這也適用通過摺疊爲一個字符串,在d和c之間標識t的組(或者c和d - 不確定是否也需要這個選項),然後確定它們的位置並根據需要刪除行。

df =  data.frame(x=c("a","a","b","b","b","c","d","t","c","b","t","c","t","a","a","b","d","t","t","c"), 
       y=c(2,4,5,2,6,2,4,5,2,6,2,4,5,2,6,2,4,5,2,6),stringsAsFactors = FALSE) 

dfs <- paste0(df$x,collapse="") #collapse to a string 
dfs2 <- do.call(rbind,lapply(list(gregexpr("dt+c",dfs),gregexpr("ct+d",dfs)), 
       function(L) data.frame(x=L[[1]],y=attr(L[[1]],"match.length")))) 
dfs2 <- dfs2[dfs2$x>0,] #remove any -1 values (if string not found) 
drop <- unlist(mapply(function(a,b) (a+1):(a+b-2),dfs2$x,dfs2$y)) 
df2 <- df[-drop,] 
+0

@Adrew這也很好!非常感謝您的時間:) –

0

這裏是基礎R另一種解決方案:

df = data.frame(x = c("a","a","b","b","b","c","d","t","c","b","t","c","t","a","a","b","d","t","t","c"), 
       y = c(2,4,5,2,6,2,4,5,2,6,2,4,5,2,6,2,4,5,2,6)) 

# 
s <- paste0(df$x, collapse="") 
L <- c(NA, NA) 
while (TRUE) { 
    r <- regexec("dt+c", s)[[1]] 
    if (r[1]==-1) break 
    L <- rbind(L, c(pos=r[1]+1, length=attr(r, "match.length")-2)) 
    s <- sub("d(t+)c", "x\\1x", s) 
} 
L <- L[-1,] 
drop <- unlist(apply(L,1, function(x) seq(from=x[1], len=x[2]))) 
df[-drop, ] 
# > drop 
# 8 18 19 
# > df[-drop, ] 
# x y 
# 1 a 2 
# 2 a 4 
# 3 b 5 
# 4 b 2 
# 5 b 6 
# 6 c 2 
# 7 d 4 
# 9 c 2 
# 10 b 6 
# 11 t 2 
# 12 c 4 
# 13 t 5 
# 14 a 2 
# 15 a 6 
# 16 b 2 
# 17 d 4 
# 20 c 6 

隨着gregexpr()它短小:

s <- paste0(df$x, collapse="") 
g <- gregexpr("dt+c", s)[[1]] 
L <- data.frame(pos=g+1, length=attr(g, "match.length")-2) 
drop <- unlist(apply(L,1, function(x) seq(from=x[1], len=x[2]))) 
df[-drop, ] 
+0

這也適用!謝謝@Jogo! –