我如何計算許多列表中的n-gram發生次數

有人知道是否可以從n克詞彙表中計數，這些數字在幾個不同的標記列表中出現多少次？詞彙表中列出了n克，其中每個獨特n克列出一次。如果我有：我如何計算許多列表中的n-gram發生次數

名單

['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'] //1 

['hello', 'I', 'am', 'doing', 'okay', 'are', 'you', 'okay'] //2 

<type = list>

的N-gram詞彙

('hello','I') 
('I', 'am') 
('am', 'doing') 
('doing', 'okay') 
('okay','are') 
('hello', 'how') 
('how', 'are') 
('are','you') 
('you', 'doing') 
('doing', 'today') 
('today', 'are') 
('you', 'okay') 
<type = tupels>

然後，我所要的輸出是這樣的：

列表1：

('hello', 'how')1 
('how', 'are')1 
('are','you')2 
('you', 'doing')1 
('doing', 'today')1 
('today', 'are')1 
('you', 'okay')1

列表2：

('hello','I')1 
('I', 'am')1 
('am', 'doing')1 
('doing', 'okay')1 
('okay','are')1 
('are','you')1 
('you', 'okay')1

我有以下代碼：

test_tokenized = [word_tokenize(i) for i in test_lower] 

for test_toke in test_tokenized: 

    filtered_words = [word for word in test_toke if word not in stopwords.words('english')] 

    bigram = bigrams(filtered_words) 

    fdist = nltk.FeatDict(bigram) 

    for k,v in fdist.items(): 
     #print (k,v) 
     occur = (k,v)

來源

2017-04-05 MyTivoli

使用列表解析生成的n-gram和collections.Counter計數重複：

from collections import Counter 
l = ['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'] 
ngrams = [(l[i],l[i+1]) for i in range(len(l)-1)] 
print Counter(ngrams)

來源

2017-04-05 15:07:22 acidtobi

我會建議使用一個for循環使用範圍：

from collections import Counter 
list1 = ['hello', 'how', 'are', 'you', 'doing', 'today', 'are', 'you', 'okay'] 
list2 = ['hello', 'I', 'am', 'doing', 'okay', 'are', 'you', 'okay'] 

def ngram(li): 
    result = [] 
    for i in range(len(li)-1): 
     result.append((li[i], li[i+1])) 
    return Counter(result) 

print(ngram(list1)) 
print(ngram(list2))

來源

2017-04-05 15:02:54 Neil

我如何計算許多列表中的n-gram發生次數

回答

相關問題