在R語料庫中搜索以「esque」結尾的所有單詞

我使用R的tm包使用字典方法獲取單詞頻率。我希望找到以「esque」結尾的所有單詞，無論它們拼寫爲「abcd-esque」，「abcdesque」還是「abcd esque」（因爲我的語料庫中存在所有不同的拼寫）。我如何爲此創建正則表達式？這是我迄今爲止所擁有的。任何幫助/提示將不勝感激。在R語料庫中搜索以「esque」結尾的所有單詞

text <- Corpus(DirSource("txt/")) 
text <- tm_map(text,tolower) 
text <- tm_map(text,stripWhitespace) 
dtm.text <- DocumentTermMatrix(text) 
list<-inspect(
    DocumentTermMatrix(text,list(dictionary = c("rose", "green", "esque"))) 
)

來源

2014-12-19 torentino

'grep的（「式的$」，X）'？ – thelatemail 2014-12-19 03:41:27

inspect(dtm.text[, grepl("esque$", dtm.text$dimnames$Terms)])

作爲一個側面說明tolower不會隨着tm當前版本。您應該使用contetn_transformer代替：

tm_map(text, content_transformer(tolower))

來源

2014-12-19 03:43:05 zero323

謝謝大家。這個解決了這個問題。 – torentino 2014-12-19 03:58:04

words = c("rose", "green", "esque", "abcd-esque", "abcdesque", "abcd esque") 
grep("esque$", words)

來源

2014-12-19 03:52:29 user51855

在R語料庫中搜索以「esque」結尾的所有單詞

回答

相關問題