正則表達式python後引語

我想開發一個Python程序，將從潘多拉的twit獲得藝術家的名字。舉例來說，如果我有這個推特：正則表達式python後引語

我在聽潘多拉的Luther Vandross的「I Can Make It Better」#pandora http://t.co/ieDbLC393F。

我想只得到名字路德範德羅斯回來。我不知道很多關於正則表達式，所以我試着做下面的代碼：

print re.findall('".+?" by [\w+]+', text)

但結果卻路德

「我可以做的更好」

你對我怎麼會什麼想法能夠在python上開發一個正則表達式來獲得它？

來源

2015-06-21 Filipe

>>> s = '''I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.''' 

>>> import re 
>>> m = re.search('to "?(.*?)"? by (.*?) on #?Pandora', s) 
>>> m 
<_sre.SRE_Match object; span=(14, 69), match='to "I Can Make It Better" by Luther Vandross on P> 
>>> m.groups() 
('I Can Make It Better', 'Luther Vandross')

多個測試用例：

>>> tests = [ 
    '''I'm listening to "Don't Turn Out The Lights (D.T.O.T.L.)" by NKOTBSB on #Pandora''', 
    '''I'm listening to G.O.D. Remix by Canton Jones on #Pandora''', 
    '''I'm listening to "It's Been Awhile" by @staindmusic on Pandora #pandora http://pdora.co/R1OdxE''', 
    '''I'm listening to "Everlong" by @foofighters on #Pandora http://pdora.co/1eANfI0''', 
    '''I'm listening to "El Preso (2000)" by Fruko Y Sus Tesos on #Pandora http://pdora.co/1GtOHC1''' 
    '''I'm listening to "Cat Daddy" by Rej3ctz on #Pandora http://pdora.co/1eALNpc''', 
    '''I'm listening to "Space Age Pimpin'" by 8 Ball & MJG on Pandora #pandora http://pdora.co/1h8swun''' 
] 
>>> expr = re.compile('to "?(.*?)"? by (.*?) on #?Pandora') 
>>> for s in tests: 
     print(expr.search(s).groups()) 

("Don't Turn Out The Lights (D.T.O.T.L.)", 'NKOTBSB') 
('G.O.D. Remix', 'Canton Jones') 
("It's Been Awhile", '@staindmusic') 
('Everlong', '@foofighters') 
('El Preso (2000)', 'Fruko Y Sus Tesos') 
("Space Age Pimpin'", '8 Ball & MJG')

來源

2015-06-21 16:35:56 poke

非常感謝！我設法使它適用於這個=） – Filipe

我掃描了Twitter上的#Pandora主題標籤瞭解更多示例，並調整了表達式使其適用於所有這些模式。 – poke

您需要使用捕獲組。

print re.findall(r'"[^"]*" by ([A-Z][a-z]+(?: [A-Z][a-z]+){0,2})', text)

我用的量詞repeatation，因爲這個名字可能只包含名字或第一，姓氏或名字，中間，最後一個名字。

來源

2015-06-21 16:37:22

非常感謝您的幫助=） – Filipe

print re.findall('".+?" by ((?:[A-Z][a-z]+)+)', text)

你可以試試看。

https://regex101.com/r/vH0iN5/5

來源

2015-06-21 16:37:58 vks

您可以使用此環視基於正則表達式：

str = 'I\'m listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.'; 
print re.search(r'(?<=by).+?(?= on)', str).group() 
Luther Vandross

來源

2015-06-21 16:38:48 anubhava

你的正則表達式是接近的，但你可以改變分隔符使用" by和on。但是，您需要使用括號來捕獲組。

您可以使用這樣的正則表達式：

" by (.+?) on

Working demo

Regular expression visualization

這個表達式背後的想法是捕捉" by和on之間的內容，使用簡單nongreedy正則表達式。

匹配信息

MATCH 1 
1. [43-58] `Luther Vandross`

代碼

import re 
p = re.compile(ur'" by (.+?) on') 
test_str = u"I'm listening to \"I Can Make It Better\" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.\n" 

re.search(p, test_str)

來源

2015-06-21 16:39:07

感謝您的幫助=），我對理解正則表達式的工作原理有些困難，但是這個答案使得它更加清晰。 – Filipe

@菲力高興地幫忙。 –

正則表達式python後引語

回答

相關問題