python同時處理多個字符串

我有一個字符串列表，我想刪除每個字符串中的停用詞。問題是，停用詞的長度比字符串長得多，我不想重複比較每個字符串和停用詞列表。 python中有多種方式可以同時使用這些多個字符串嗎？python同時處理多個字符串

lis = ['aka', 'this is a good day', 'a pretty dog'] 
stopwords = [] # pretty long list of words 
for phrase in lis: 
    phrase = phrase.split(' ') # get list of words 
    for word in phrase: 
     if stopwords.contain(word): 
      phrase.replace(word, '')

這是我目前的方法。但是這意味着我必須經歷列表中的所有短語。有沒有一種方法可以用一次比較來處理這些短語？

謝謝。

來源

2014-12-05 JudyJiang

「長」要多長時間？如果它不到10萬個元素，我不會擔心。特別是如果你將'stopwords'放入一個集合中，因爲'set in set x'檢查速度非常快。 – Kevin 2014-12-05 16:26:31

一個嵌套的列表理解陳述可能會更好（或更混亂？）看，但這是非常好的方式，我可以看到做到這一點 – TehTris 2014-12-05 16:28:59

@Kevin嗯，它是10萬長，但仍然不想以檢查多次.. – JudyJiang 2014-12-05 16:29:41

這是一樣的想法，但有一些改進。將您的list停用詞轉換爲set以加快查找速度。然後，您可以遍歷列表理解中的短語列表。然後你可以迭代短語中的單詞，如果它們不在停止集中，則保留它們，然後將短語重新組合在一起。

>>> lis = ['aka', 'this is a good day', 'a pretty dog'] 
>>> stopwords = ['a', 'dog'] 
>>> stop = set(stopwords) 
>>> [' '.join(j for j in i.split(' ') if j not in stop) for i in lis] 
['aka', 'this is good day', 'pretty']

來源

2014-12-05 16:27:20 CoryKramer

您可以計算每個短語形成的列表與停用詞之間的差異。

>>> lis = ['aka', 'this is a good day', 'a pretty dog'] 
>>> stopwords = ['a', 'dog'] 

>>> stop = set(stopwords) 
>>> result = map(lambda phrase: " ".join(list(set(phrase.split(' ')) - stop)), lis) 
>>> print(result) 

['aka', 'this is good day', 'pretty']

來源

2014-12-05 16:47:27

實際上，由於您對分組進行了分組，因此它會混淆詞組中的單詞的順序。與''=''a b c d e f g''''它給'''''''''''''。 – Dettorer 2014-12-05 17:02:43

python同時處理多個字符串

回答

相關問題