5
有沒有更好的方法來實現這一目標?我想從這個向量中刪除所有字符串,它們是其他元素的子字符串。移除另一個子字符串的矢量元素
words = c("please can you",
"please can",
"can you",
"how did you",
"did you",
"have you")
> words
[1] "please can you" "please can" "can you" "how did you" "did you" "have you"
library(data.table)
library(stringr)
dt = setDT(expand.grid(word1 = words, word2 = words, stringsAsFactors = FALSE))
dt[, found := str_detect(word1, word2)]
setdiff(words, dt[found == TRUE & word1 != word2, word2])
[1] "please can you" "how did you" "have you"
這個工程,但它似乎是矯枉過正,我很想知道一個更優雅的做法。
'CJ'是'expand.grid快得多'data.table' ' – jenesaisquoi
只是想爲這個任何人跟進一些肉。 'CJ' **更快**。我使用'12431'行,平均爲'15.69'字/行,對於'195065'字的總集合並通過'system.time(dt < - setDT(expand.grid(word1 = words,word2 = words ,stringsAsFactors = FALSE)))用戶系統中經過的8.414 3.387 13.854''system.time(dt1 < - CJ(words,words,unique = TRUE))'在用戶系統中經過了0.932 0.365 1.320'。數量級差異。 –
真棒,感謝您的基準 –