與R-TM閱讀文件與使用R-槌

我有這樣的代碼正好與R wrapper for MALLET主題模型：與R-TM閱讀文件與使用R-槌

docs <- mallet.import(DF$document, DF$text, stop_words) 

mallet_model <- MalletLDA(num.topics = 4) 
mallet_model$loadDocuments(docs) 
mallet_model$train(100)

我已經使用了tm包來閱讀我的文件，這些文件是txt文件在一個目錄：

myCorpus <- Corpus(DirSource("data")) # a directory of txt files

語料庫不能被用作mallet.import輸入，讓我怎麼從TM語料庫myCorpus得到上面的DF在打電話？

來源

2017-04-22 textnet

您可以使用整齊的數據原則處理您的文本，並準備好輸入到槌子中，每個文檔一行，as described here。

另外，還有一些在tidytext的mallet package tidiers，你可以用它們來分析槌主題建模的輸出：

# word-topic pairs 
tidy(mallet_model) 

# document-topic pairs 
tidy(mallet_model, matrix = "gamma") 

# column needs to be named "term" for "augment" 
term_counts <- rename(word_counts, term = word) 
augment(mallet_model, term_counts)

來源

2017-04-30 14:17:37

RMallet旨在成爲一個獨立的軟件包，因此與tm的集成並不是很好。對RMallet輸入的要求是每個文檔有一行的數據框和包含文本的字符字段，它預期不會被標記。

來源

2017-04-24 00:39:44

與R-TM閱讀文件與使用R-槌

回答

相關問題