2
對於我的編程班結束,我必須根據以下描述來創建一個功能:Python - 從文本中提取主題標籤;在標點符號
的參數是一個鳴叫。該函數應該按照它們在推文中出現的順序返回一個包含推文中所有標籤的列表。返回列表中的每個hashtag應該刪除初始散列符號,並且hashtags應該是唯一的。 (如果鳴叫使用相同的主題標籤的兩倍,它被包含在列表中只有一次。該井號標籤的順序應該與鳴叫每個標籤中第一次出現的順序。)
我不確定如何當遇到標點符號時,哈希標籤就會結束(參見第二個doctest示例)。我目前的代碼是不輸出任何東西:
def extract(start, tweet):
""" (str, str) -> list of str
Return a list of strings containing all words that start with a specified character.
>>> extract('@', "Make America Great Again, vote @RealDonaldTrump")
['RealDonaldTrump']
>>> extract('#', "Vote Hillary! #ImWithHer #TrumpsNotMyPresident")
['ImWithHer', 'TrumpsNotMyPresident']
"""
words = tweet.split()
return [word[1:] for word in words if word[0] == start]
def strip_punctuation(s):
""" (str) -> str
Return a string, stripped of its punctuation.
>>> strip_punctuation("Trump's in the lead... damn!")
'Trumps in the lead damn'
"""
return ''.join(c for c in s if c not in '!"#$%&\'()*+,-./:;<=>[email protected][\\]^_`{|}~')
def extract_hashtags(tweet):
""" (str) -> list of str
Return a list of strings containing all unique hashtags in a tweet.
Outputted in order of appearance.
>>> extract_hashtags("I stand with Trump! #MakeAmericaGreatAgain #MAGA #TrumpTrain")
['MakeAmericaGreatAgain', 'MAGA', 'TrumpTrain']
>>> extract_hashtags('NEVER TRUMP. I'm with HER. Does #this! work?')
['this']
"""
hashtags = extract('#', tweet)
no_duplicates = []
for item in hashtags:
if item not in no_duplicates and item.isalnum():
no_duplicates.append(item)
result = []
for hash in no_duplicates:
for char in hash:
if char.isalnum() == False and char != '#':
hash == hash[:char.index()]
result.append()
return result
我很迷茫在這一點上;任何幫助,將不勝感激。先謝謝你。
注意:我們是而不是允許使用正則表達式或導入任何模塊。
那麼..如果你需要結束標點符號,並且沒有*那許多點符號,爲什麼不檢查下一個字符是否是標點符號? – Pythonista