使用FreqDist總結詞頻數量，python

如何使用FreqDist中的fd.items（）來總結詞頻數量？使用FreqDist總結詞頻數量，python

>>> fd = FreqDist(text) 
>>> most_freq_w = fd.keys()[:10] #gives me the most 10 frequent words in the text 
>>> #here I should sum up numbers of each of these 10 freq words appear in the text

例如，如果most_freq_w中的每個單詞出現10次，結果應該是100

!!!我不需要這個數字的所有詞語的文字，只是最常見的10

來源

2010-11-17 Gusto

認爲（這幾乎是侮辱性的簡單）。或者至少向我們展示你的嘗試。 – delnan 2010-11-17 17:05:06

我試過的是將'fd.items'從'most_freq_w'中取出 - 但這是絕對錯誤的，因爲結果爲'0' – Gusto 2010-11-17 17:27:14

我不熟悉nltk，但由於FreqDist從dict派生，那麼下面應該工作：

v = fd.values() 
v.sort() 
count = sum(v[-10:])

來源

2010-11-17 17:28:52

對我來說工作正常！ – Gusto 2010-11-17 18:00:43

注意：'FreqDist'已經返回按降序排序的值，即'count = sum（fd.values（）[：10]）'產生與上面相同的結果。 – jfs 2012-09-01 03:50:45

如果FreqDist是的話，以它們的頻率映射：

sum(map(fd.get, most_freq_w))

來源

2010-11-17 18:58:11 jfs

要查找的次數的單詞出現在語料（你的一段文字）：

raw="<your file>" 
tokens = nltk.word_tokenize(raw) 
fd = FreqDist(tokens) 
print fd['<your word here>']

來源

2013-08-10 21:34:11

它有一個漂亮的打印功能

fd.pprint()

將做到這一點。

來源

2015-11-19 18:02:35 Steve

使用FreqDist總結詞頻數量，python

回答

相關問題