2015-03-31 76 views
1

使用gensim鳴叫運行LDA時,我有下面的代碼,運行在鳴叫的LDA分析:錯誤關於Python

import logging, gensim, bz2 
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) 

# load id->word mapping (the dictionary), one of the results of step 2 above 
id2word = 'enams4nieuw.dict' 
# load corpus iterator 
mm = gensim.corpora.MmCorpus('enams4nieuw.mm') 

print(mm) 

# extract 100 LDA topics, using 1 pass and updating once every 1 chunk (10,000 documents) 
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1) 

當我嘗試運行此腳本時,我收到下面的日誌,錯誤信息:

MmCorpus(40152 documents, 13061 features, 384671 non-zero entries) 
2015-03-31 16:52:50,246 : INFO : loaded corpus index from enams4nieuw.mm.index 
2015-03-31 16:52:50,246 : INFO : initializing corpus reader from enams4nieuw.mm 
2015-03-31 16:52:50,246 : INFO : accepted corpus with 40152 documents, 13061 features, 384671 non-zero entries 
Traceback (most recent call last): 
    File "C:/Users/gerbuiker/PycharmProjects/twitter-streaming.py/lda.py", line 15, in <module> 
    lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1) 
    File "C:\Users\gerbuiker\AppData\Roaming\Python\Python27\site-packages\gensim\models\ldamodel.py", line 244, in __init__ 
self.num_terms = 1 + max(self.id2word.keys()) 
AttributeError: 'str' object has no attribute 'keys' 

Process finished with exit code 1 

任何人都有這個解決方案嗎?

回答

1

將變量id2word設置爲字符串。

看來你有一個文件名 - 我假設你醃你的字典?

id2word需要是字典。

0

我有同樣的錯誤,它似乎像ldamodel.py試圖採取關鍵字的最大值而不是索引/ ID的,所以我的解決方案只是交換字典中的列。

my_dict2 = {y:x for x,y in my_dict.items()}