stemCompletion error using r tm package

我在r中使用tm包。一切工作正常，直到我包括stemCompletion。我收到以下錯誤：stemCompletion error using r tm package

Error in grep(sprintf("^%s", w), dictionary, value = TRUE) : 
    invalid regular expression

我的代碼如下：

path = '~/Interviews/Transcripts/' 
file.names <- dir(path, pattern = '.txt') 

corpus = lapply(seq_along(file.names), function(index) { 
    fileName = file.names[index] 
    filePath = paste(path, fileName, sep = '') 
    transcript = readChar(filePath, file.info(filePath)$size) 
    transcript <- gsub("[’‘^]", '', transcript) 

    corpusName = paste('transcript', index, sep = "_") 

    c <- Corpus(VectorSource(transcript)) 
    DublinCore(c[[1]], 'Identifier') <- paste(index, fileName, sep ='_') 
    meta(c, type = 'corpus') 

    c <- tm_map(c, stripWhitespace) 
    c <- tm_map(c, content_transformer(tolower)) 
    c <- tm_map(c, removeWords, c(stopwords("english"), 'yeah', 'yep')) 
    c <- tm_map(c, removePunctuation) 
    c <- tm_map(c, stemDocument) 
    c <- tm_map(c, stemCompletion, c) 
    c <- tm_map(c, PlainTextDocument) 
    c 
})

來源

2016-05-16 user3603308

這是不可重現的。祝你好運，找到能夠深入研究的人。 [這裏有一些技巧]（http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example）關於如何做一個很好的例子。 –

你期望'stemCompletion'做什麼？ – lukeA

首先，在理論上你可能想使用tm_map(c, content_transformer(stemCompletion), c)因爲tm_map(c, stemCompletion, c)傳遞一個PlainTextDocument到的參數xstemCompletion，雖然它期望一個字符向量（見?stemCompletion）。其次，由於你沒有做任何標記（例如?termDocumentMatrix），並且你的詞典語料庫已經被阻止，所以你所嘗試的可能不會以這種方式工作。

（和第三，我第二@RomanLuštrik：請編輯您的帖子，並使其成爲最小的重複的例子，這樣一來，讀者&別人，誰見證了這一錯誤，可以很容易地遵循。）

下面是一個例子：

content(tm_map(Corpus(VectorSource("stem completion has advantages")), stemDocument)[[1]]) 
# [1] "stem complet has advantag" 

stemCompletion(c("complet", "advantag"), Corpus(VectorSource("stem completion has advantages"))) 
#  complet  advantag 
# "completion" "advantages"

來源

2016-05-16 11:20:27 lukeA

stemCompletion error using r tm package

回答

相關問題