2017-04-18 53 views
-2

我對python很陌生,試圖找到tweet是否有任何查找元素。使用字符串以任意順序匹配數組元素

例如,如果我能找到這個單詞貓,它應該匹配貓,也可以任意順序匹配可愛的小貓。但從我瞭解我無法找到解決方案。任何指導表示讚賞。

import re 
lookup_table = ['cats', 'cute kittens', 'dog litter park'] 
tweets = ['that is a cute cat', 
      'kittens are cute', 
      'that is a cute kitten', 
      'that is a dog litter park', 
      'no wonder that dog park is bad'] 
for tweet in tweets: 
    lookup_found = None 
    print re.findall(r"(?=(" + '|'.join(lookup_table) + r"))", tweet.lower()) 

輸出

['cat'] 
[] 
[] 
['dog litter park'] 
[] 

預期輸出:

that is a cute cat > cats 
kittens are cute > cute kittens 
this is a cute kitten > cute kittens 
that is a dog litter park > dog litter park 
no wonder that dog park is bad > dog litter park 
+0

?? ??使用單數形式。 –

+1

你也應該告訴我們你實際需要的輸出。 –

+0

@KarolyHorvath我不確定你的意思是 – user6083088

回答

0

對於查找的話這是隻有一個字的文字,你可以使用

for word in tweet 

而對於像查找單詞「可愛的小貓」,你在哪裏等任何訂單。只需將它分開並在推文字符串中查找即可。

這是我試過的,它效率不高,但它的工作。嘗試運行它。

lookup_table = ['cat', 'cute kitten', 'dog litter park'] 
tweets = ['that is a cute cat', 
      'kittens are cute', 
      'that is a cute kitten', 
      'that is a dog litter park', 
      'no wonder that dog park is bad'] 

for word in lookup_table: 
    for tweet in tweets: 
     if " " in word: 
      temp = word.split(sep=" ") 
     else: 
      temp = [word] 
     for x in temp: 
      if x in tweet: 
       print(tweet) 
       break 
0

這是我該怎麼做。我認爲lookup_table不必太嚴格,我們可以避免複數;

import re 
lookup_table = ['cat', 'cute kitten', 'dog litter park'] 
tweets = ['that is a cute cat', 
     'kittens are cute', 
     'that is a cute kitten', 
     'that is a dog litter park', 
     'no wonder that dog park is bad'] 
for data in lookup_table: 
    words=data.split(" ") 
    for word in words: 
     result=re.findall(r'[\w\s]*' + word + '[\w\s]*',','.join(tweets)) 
     if len(result)>0: 
      print(result) 
0

問題1:

單/複數: 只是爲了讓事情滾動我會用活用,Python包擺脫單一&複數,例如...

問題2:

分裂和加入: 我寫了一個小腳本來演示率你如何使用它,沒有穩健測試,但應該讓你移動

import inflect 
p = inflect.engine() 
lookup_table = ['cats', 'cute kittens', 'dog litter park'] 
tweets = ['that is a cute cat', 
      'kittens are cute', 
      'that is a cute kitten', 
      'that is a dog litter park', 
      'no wonder that dog park is bad'] 

for tweet in tweets: 
    matched = [] 
    for lt in lookup_table: 
      match_result = [lt for mt in lt.split() for word in tweet.split() if p.compare(word, mt)] 
      if any(match_result): 
       matched.append(" ".join(match_result)) 
    print tweet, '>>' , matched 
相關問題