dplyr排除將該行與若干

我有數據集這樣dplyr排除將該行與若干

df <- data.frame(ID = c(334, 111, 324, 234), 
       Name = c("Tom", "Mike", "John", "Tim"), 
       Score = c(2, 9, 3, 5))

開始使用dplyr包，我怎麼能排除ID開始3

來源

2017-09-26 joerna

'庫（dplyr）''過濾器（！DF，SUBSTR（ID，1,1）= 3）' – CPak

爲什麼你需要dplyr？ 'df [grep（「^ 3」，df $ ID，invert = TRUE）]' – PoGibas

'grepl'會更安全。 –

這是如何使用dplyr做到這一點：

library(dplyr) 
df %>% 
    filter(grepl("^[^3]", ID))

結果：

ID Name Score 
1 111 Mike  9 
2 234 Tim  5

數據：

df = data.frame(ID = c(334, 111, 324, 234), 
       Name = c("Tom", "Mike", "John", "Tim"), 
       Score = c(2, 9, 3, 5))

來源

2017-09-27 02:27:31 useR

不知何故，我錯過了從3開始的「排除」ID。編輯糾正 – useR

library(dplyr) 
library(microbenchmark) 
N <- 1e6

功能測試：

f_grep <- function() df[grep("^3", df$ID, invert = TRUE), ] 
f_grepl <- function() df[!grepl("^3", df$ID), ] 
f_modul <- function() df[df$ID %/% 300 != 1, ] 
f_sWith <- function() df[startsWith(as.character(df$ID), "3"), ] 
f_subSt <- function() df[substr(df$ID, 1, 1) != 3, ]

使用原始的OP數據：

df <- data.frame(ID = c(334, 111, 324, 234), 
       Name = c("Tom", "Mike", "John", "Tim"), 
       Score = c(2, 9, 3, 5)) 

microbenchmark(f_grep(), f_grepl(), f_modul(), f_sWith(), f_subSt()) 

Unit: microseconds 
     expr min  lq  mean median  uq  max neval 
    f_grep() 42.207 47.0645 65.51158 58.0910 62.2905 865.607 100 
f_grepl() 35.762 40.5785 59.13411 49.6425 54.4015 1023.742 100 
f_modul() 27.659 32.4575 154.65156 41.5485 44.1945 10969.091 100 
f_sWith() 30.866 35.0830 93.27367 44.0320 47.3740 3642.091 100 
f_subSt() 33.470 37.8465 57.94782 47.1935 49.5860 991.518 100

使用較大的OP數據：

df <- data.frame(ID = sample(df$ID, N, replace = TRUE), 
       Name = sample(df$Name, N, replace = TRUE), 
       Score = sample(df$Score, N, replace = TRUE)) 

microbenchmark(f_grep(), f_grepl(), f_modul(), f_sWith(), f_subSt()) 

Unit: milliseconds 
     expr  min  lq  mean median  uq  max neval 
    f_grep() 472.19564 479.15768 492.12995 495.77323 503.16749 538.67349 100 
f_grepl() 478.68982 483.25584 496.40382 501.86222 507.34989 535.04327 100 
f_modul() 29.78637 30.74446 41.82639 32.61941 53.58474 62.51763 100 
f_sWith() 386.47298 388.99461 401.46679 398.01549 412.25743 435.97195 100 
f_subSt() 423.53511 426.11061 438.80629 442.81014 449.26856 471.70923 100

來源

2017-09-26 20:35:52 PoGibas

這實際上並沒有解決如何在dplyr工作流程中專門做這件事的問題。 –

OP也從來沒有要求性能，所以它顯示基準有點矯枉過正 – useR

dplyr排除將該行與若干

回答

相關問題