主題分佈的不同維度

-2

我想將所有文檔劃分爲10個主題，除了主題的分佈維數和協方差矩陣之外，它與聚合結果一致。
爲什麼主題分佈是9維矢量而不是10，它們的協方差矩陣是9 * 9矩陣而不是10 * 10？主題分佈的不同維度

我使用library(topicmodels)和函數CTM()來實現中文主題模型。

我的代碼如下：

library(rJava); 
library(Rwordseg); 
library(NLP); 
library(tm); 
library(tmcn) 
library(tm) 
library(Rwordseg) 
library(topicmodels) 

installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Law.scel","Law"); 
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\NationalInstitution.scel","NationalInstitution"); 
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Place.scel","Place"); 
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Psychology.scel","Psychology"); 
installDict("C:\\Users\\Jeffy\\OneDrive\\Workplace\\R\\Politics.scel","Politics"); 
listDict(); 

#read file 
d.vec <- segmentCN("samgovWithoutID.csv", returnType = "tm") 
samgov.segment <- read.table("samgovWithoutID.segment.csv", header = TRUE, fill = TRUE, stringsAsFactors = F, sep = ",",fileEncoding='utf-8') 
fix(samgov.segment) 

# create DTM(document term matrix) 
d.corpus <- Corpus(VectorSource(samgov.segment$content)) 
inspect(d.corpus[1:10]) 
d.corpus <- tm_map(d.corpus, removeWords, stopwordsCN()) 
ctrl <- list(removePunctuation = TRUE, removeNumbers= TRUE, wordLengths = c(1, Inf), stopwords = stopwordsCN(), wordLengths = c(2, Inf)) 
d.dtm <- DocumentTermMatrix(d.corpus, control = ctrl) 
inspect(d.dtm[1:10, 110:112]) 

# impletment topic models 
ctm10<-CTM(d.dtm,k=10, control=list(seed=2014012692)) 
Terms10 <- terms(ctm10, 10) 
Terms10[,1:10] 

ctm20<-CTM(d.dtm,k=20, control=list(seed=2014012692)) 
Terms20 <- terms(ctm20, 20) 
Terms20[,1:20]

結果中的R工作室（見突出顯示的部分）：