比較在Python

在列表/字典單詞最有效的方式，我有下面的句子和字典：比較在Python

sentence = "I love Obama and David Card, two great people. I live in a boat" 

dico = { 
'dict1':['is','the','boat','tree'], 
'dict2':['apple','blue','red'], 
'dict3':['why','Obama','Card','two'], 
}

我想匹配的是這句話，在一個給定的字典中元素的個數。較重的方法在於做以下步驟：

classe_sentence = [] 
text_splited = sentence.split(" ") 
dic_keys = dico.keys() 
for key_dics in dic_keys: 
    for values in dico[key_dics]: 
     if values in text_splited: 
      classe_sentence.append(key_dics) 

from collections import Counter 
Counter(classe_sentence)

這給下面的輸出：

Counter({'dict1': 1, 'dict3': 2})

然而，它的效率不高，因爲在所有有兩個迴路，它是原始comparaison。我想知道是否有更快的方法來做到這一點。也許使用itertools對象。任何想法？

在此先感謝！

來源

2016-11-13 Jb_Eyd

如果需要成爲序列，那麼'dico'數據結構不需要是字典。 – jsbueno

@jsbueno：如果你想給每個序列一個標籤，請不要。這些標籤在這裏用於輸出。 –

@jsbueno你是什麼意思？意志 - 它會增加過程的速度？ –

可以使用set數據的數據類型爲你的比較，以及set.intersection方法來獲得匹配的數量。

它會提高算法效率，但它只會對每個單詞計數一次，即使它出現在句子中的幾個位置。

sentence = set("I love Obama and David Card, two great people. I live in a boat".split()) 

dico = { 
'dict1':{'is','the','boat','tree'}, 
'dict2':{'apple','blue','red'}, 
'dict3':{'why','Obama','Card','two'} 
} 


results = {} 
for key, words in dico.items(): 
    results[key] = len(words.intersection(sentence))

來源

2016-11-13 16:36:29 jsbueno

這可以與計數器結合使用，先從給定的句子中創建一個計數器和一個集合，然後計算交集中每個成員的計數 – Copperfield

假設你要區分大小寫的匹配：

from collections import defaultdict 
sentence_words = defaultdict(lambda: 0) 
for word in sentence.split(' '): 
    # strip off any trailing or leading punctuation 
    word = word.strip('\'";.,!?') 
    sentence_words[word] += 1 
for name, words in dico.items(): 
    count = 0 
    for x in words: 
     count += sentence_words.get(x, 0) 
    print('Dictionary [%s] has [%d] matches!' % (name, count,))

來源

2016-11-13 16:36:03 2ps

比較在Python

回答

相關問題